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PREFACE TO THE SECOND EDITION 

Striking examples of the utility and scope of the science of sta- 
tistics have occurred in recent years. As Professor Harold Hotelling 
remarks ( Annals of Mathematical Statistics , vol. 11 , 1940, pp. 457- 
470): 

Indeed it seems as if the exploitation of the business and 
manufacturing possibilities of statistical methods has only 
begun and that limitless further fields are coming into view. 

The widespread use of statistical methods and the gratifying in- 
terest shown in the present book have made possible a second edition 
at tliis time. This opportunity has been used to polish and clarify 
certain portions of the text. For suggestions leading to the excision 
of obscurities [ am indebted to many of my students and to a number 
of friends in other universities, particularly to Professors Irving W. 
Burr, John H. Curtiss, Henry Sc heft'd, Guy G. Speeker, and Howard 
E. Wahlert. Of coui-se, full responsibility for any remaining errors 
or other defects is my own. 


J. F. K. 




PREFACE TO THE FIRST EDITION 

The field of statistics is many sided and ranges over different levels. 
However, between the levels of clerical work at one extreme and 
mathematical research at the other extreme, there is a well-defined 
methodology, mathematical in nature, which underlies the specialized 
applications in the departments of economics, psychology, education, 
and biology. 

This book is an elementary text dealing with the mathematics of 
statistics. Fortunately, a considerable part of the descriptive meth- 
odology of statistics can be understood by those having relatively 
little knowledge of college mathematics. Although no mathematics 
beyond the ordinary Freshman course in college algebra is required 
for a profitable reading of this text, a certain degree of mathematical 
maturity and intelligence is presupposed. To achieve the maximum 
success perhaps only the best of those students whose mathematical 
preparation is limited to the minimum prerequisite should be encour- 
aged to study it. Occasionally, material is introduced to sharpen 
the interest and challenge the ability of the more advanced student 
without interrupting the main developments or discouraging those 
less mature. 

In writing this book, considerable selection of material necessarily 
had to be made. The omission of certain topics will be noted in the 
table of contents. Judging from my own experience, and that of 
others, the theory of sampling cannot be taught satisfactorily at the 
level for which Part I is intended. At best only a superficial use of 
formulas could be hoped for. Consequently, I have elected to defer 
this subject to Part II where a systematic treatment can be given. 
With regard to time series analysis, Professor J. Neyman says in his 
Lectures And Conferences On Mathematical Statistics (p. 106), 

We start by trying to split each of the series into several parts, which we 
arbitrarily assume to be additive. One of these parts is the trend, which we 
estimate perhaps by fitting a low order parabola to the whole series available. 
The next part is the “ business cycle.” The third part is the “ seasonal varia- 
tion,” which we frequently estimate by calculating moving averages. Finally, 
the remainder is considered to arise from random causes, and we concentrate 
on the question whether such a remainder in one of the variables is correlated 
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vili Preface 

with that in some other. All this procedure seems to me very artificial and 
arbitrary. ... In my opinion the whole problem of time series must be treated 
from a point of view that is quite different from the traditional one just described. 

I concur in this opinion and I believe that no useful purpose would 
be served by drilling students in the traditional procedures. 

Throughout the book the student is encouraged and stimulated to 
master fundamental principles and concepts. Essentially, the job 
of every statistician is to take hold of situations and disentangle 
them by the techniques of the science. Therefore, considerable 
emphasis is placed on technique. I have tried to develop in the 
student the ability to use symbolism creatively 'as a language. 
Numerous examples are given to clarify concepts and illustrate 
processes. Over two hundred exercises are included. It is intended 
that these exercises should be handled as in a mathematics course. 
No laboratory, so-called, is necessary. 

Nowadays, no little importance is attached to motivation. I have 
constantly held in mind the necessity of making the subject interest- 
ing and stimulating to the beginning student. Nevertheless, I ven- 
ture the opinion that the best motivation for intelligent students is 
the feeling that their teacher knows his subject. 

In preparing the manuscript a large number of books and papers 
have been examined and perhaps leaned upon. No claim to origi- 
nality is made except possibly in the matter of arrangement and 
pedagogical approach . Numerous ref erences to the scholarly achieve- 
ments of others are cited. It is hoped that the serious student will 
read some of these and thereby widen his perspective and enhance 
his interest. 

In conclusion, I wish to express my deep appreciation to Professor 
Allen T. Craig and Dr. Mason E. Wescott who critically read the 
manuscript and made many suggestions for its improvement. 


April, 1939 


John F. Kenney 
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MATHEMATICS OF STATISTICS 

INTRODUCTION 

1. Definition. The word statistics is used in at least two different 
senses. Construed as plural it refers to the systematic presentation 
of quantitative data. Used in a singular sense, the word statistics 
refers to the science which has for its object the classification and 
analysis of quantitative data so that intelligent judgments may be 
passed upon them. 

It is usually clear from the context which meaning 1 is intended, 
although some persons prefer the expression statistical methods 
for this second meaning. Statistical methods are all those devices 
used in the collection and analysis of data. The theory of statis- 
tics is the exposition of Statistical methods and is of a mathematical 
nature. 

2. Scope. There used to be a widespread misapprehension that 
statistics is a branch of economics. As a matter of fact, statistical 
problems arise in many different fields — biology, economics, engi- 
neering, insurance, education, physics, and astronomy, as well as 
various branches of business. The exploration of certain aspects of 
nearly every field involves some phase of statistical theory. Indeed, 
certain types of statistical methodology may have almost unexpected 
applications — the discovery, for example, that the life of physical 
property 2 is governed by much the same statistical rules as govern 
the lives of human beings, and hence, that life tables may be applied 
to both. Physicists have discovered that many of the problems in 
the modern theory of the structure of the atom arc? essentially sta- 
tistical in nature. In recent years industrial companies have placed 
an increasing reliance on statistical methods in controlling the 
quality of goods during manufacture. 

Statistics as a science is making contributions to all the sciences. 
On the other hand, some sciences like biometry and physics have 

1 In addition to the two meanings given above, another has crept into the 
recent literature where reference is made to a statistic. This term will be ex- 
plained later. 

* Life Expectancy of Physical Property — E. B. Kurtz. Ronald Press. 
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contributed much in the development of statistics and its terminology. 
The following quotation from Science may appropriately be men- 
tioned here: 

The extension of the scope of quantitative methods through the medium of 
statistical analysis is one of the most significant things going on in the scientific 
world at the present time. 1 

The importance of statistical method in present-day thinking has 
been well stated, as follows: 

More and more the modern temper relies upon statistical method in its at- 
tempts to understand and to chart the workings of the world in which we live. 
Particularly in those sciences which deal with human beings, whether in their 
physical and biological aspects or in their social, economic, and psychological 
relations, the spirit of our time asks that its conclusion be based not so much 
upon the distinctive reactions of one or iwo individuals as upon the observation 
of large numbers of individuals, the measurement of their common likenesses and 
the extent of their diversity. As the data thus gathered from mass phenomena 
become extensive, it becomes imperative to have methods of organization to 
bring the facts within the compass of our understanding, methods of analysis 
to make the essential relations appear out of the mass of detail in which they 
are hidden, and methods of classification and description to facilitate the pres- 
entation of the data for the study and consideration of other persons. Thus 
statistical method becomes a telescope through which we can study a larger 
terrain than would be accessible to our unaided vision. 2 

3. Statistical Methods in the Social Sciences. Because statistics 
is fundamentally the study of aggregates of individuals, rather than 
of individuals, whether these individuals be observations or measure- 
ments or persons, it is apparent that statistical methods are essential 
to social studies. Indeed it has been said that it is principally by the 
aid of such methods that these studies may be raised to the rank of 
sciences. 

This particular dependence of social studies upon statistical methods 
is mentioned in a recent book 3 from which we quote the following: 

If, as seems probable, our present uncoordinated large-scale business is to be 
further developed into an efficiently managed instrument of production serving 
the needs of the people, then statistics, together with mathematical economics, 
will emerge among the most important tools of the social sciences. For it is by 

1 Science . January 18, 1929. 

2 Mathematics and Statistics — Walker. Sixth Yearbook, National Council of 
Teachers of Mathematics. 

3 Reprinted by permission from Methods of Statistical Analysis by Davies and 
Crowder, published by John Wiley and Sons, Inc. 
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means of averages, dispersions, coefficients of variability, trends, and regressions, 
as pictured in control charts, that management is able to visualize and direct the 
movements of large masses of population. 

The work of the statistician is much like that of the map maker who presents 
the traveler with a sketch of important highways, showing the locations of towns 
and geographical features. The map is not a picture of reality. It shows cities 
as dots, and rivers as lines. It has purposely omitted the interesting details of 
scenery and the still more important features of human interest which lie along 
the route and which constitute; the traveler’s real objectives. Nevertheless, as 
a means of reaching these objectives, the map is extremely useful. And so it is 
with statistics in the hands of the business executive and statesman. Back of the 
charts are human beings with their varying characteristics and vital interests, 
few' of which can be described in figures. Yet as a means of serving these interests, 
of keeping trade moving from one region to another, of allocating investment and 
labor, and of apportioning relief to maladjusted industries and dependent classes, 
statistics and mathematical methods an* important, and are becoming increas- 
ingly important w r it.h the growing complexity of society. 

It may be said that the study of statistics is not merely an attempt to de- 
scribe wdiat actually occurs, though it must begin at. this point, but in its broader 
aspects it is the logical background of business and social management. Hence 
what appears now to 1m; ice re abstraction may later become the basic necessity 
of an applied science. Eventually, it may be assumed, the social arts of business 
and politics will rest upon as substantial a theoretical and mathematical back- 
ground as physics, chemistry, and engineering. 

4. Mathematics and Statistics. Statistical problems are of inter- 
est, therefore, not only to the worker in the particular field but also 
to the mathematician, inasmuch as methods adequate to the treat- 
ment of these problems can best be presented in the precise &nd 
accurate language of mathematics. Moreover, statistical methods 
are grounded in statistical theory which is a branch of applied mathe- 
matics. 

Although it is true that some statistical problems are ultimately 
problems in advanced mathematics, many of which mathematicians 
have not yet been able to solve, nevertheless a large and interesting 
part of statistical analysis requires mathematics no more advanced 
than elementary algebra. 

It has been said that sooner or later every true science tends to 
become mathematical. The notation of mathematics is simply a 
language and it is not limited to any particular field of knowledge. 
The following quotations are inserted to help the student approach 
the study of statistics in the proper spirit. 

1. Mathematics, the science of the ideal, becomes the means of investigating, 
understanding, and making known the world of the real. — White. 
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2. Probably among all the pursuits of the university, mathematics preemi- 
nently demands self-denial, patience, and perseverance. ... — Todhunter. 

3. From time immemorial, there has been but one way to become a mathe- 
matician and there will never be another: it is a way interior to the subject and 
involves years of assiduous toil. Short-cuts to mathematical scholarship there 
are none, whether the seeker be a philosopher or a king. — Keyser. 

4. Will is the creative force. Without the will to learn there is no learning. 
And when the will is feeble and confused, learning lags. — Mursell. 

5. The theory of statistics is not easy, not so much because it is abstruse, as 
because the ideas are new to most people, and a good deal of hard thinking and 
patient work will be necessary. . . . Statistical work always involves a lot of 
computing [and] there is no better way of learning statistics than by working 
through examples. — Tippett. 

5. Problem Assignments. The student should realize at the out- 
set that statistical methods are not substitutes for thinking but are 
aids and supplements to it. A superficial knowledge of statistical 
technique cannot take the place of good judgment. Mere ability to 
substitute in formulas should not be confused with genuine statistical 
sophistication and insight. To the serious and capable student who 
intends to master this course, formulas will be a set of functioning 
concepts and tools rather than machines into which material may be 
fed to grind out a meaningless answer. 

This opportunity is also taken to point out that even mathemat- 
ical discourse consists of sentences. Punctuation should not be 
omitted in sequences of equations and other mathematical state- 
ments. (It is admitted, however, that many of us find this difficult 
to remember.) 

Throughout the book exercises are inserted to give the student an 
opportunity to test his knowledge of the theory and methodology, 
and to develop his power of analysis. In grading the solutions, value 
will be attached to accuracy, thoroughness, neatness, and systematic 
arrangement of the work. 

6. Calculating Machines. 1 A full description of the parts of a cal- 
culating machine and their operation may be obtained from an In- 
struction Book which is furnished by^the manufacturer, so only a 
brief description will be given here. 

A calculating machine is constructed to add and substract. By 
means of continued addition or subtraction, operations involving 
multiplication, division, and square root can also be performed with 
great speed. 

1 The early history of modern computing machines is outlined in the American 
Mathematical Monthly , vol. 31 (1924), pp. 422-429. 
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In addition to a keyboard on which numbers can be punched, most 
machines have a sliding carriage, carrying two dials one above the 
other. These dials are called revolution register (upper dial) and 
product register (lower diaj). In finding a product nx y one of the 
factors n is punched on the keyboard and as the motive crank at the 
side is turned, 1 the other factor x appears on the upper dial. The 
product nx is then read from the lower dial. 

An important property of the modern calculating machine is its 
adaptability to short cuts and combinations of operations. For 
example, one may multiply two numbers nx together and add the 
result to a third number k without tabulating the intermediate steps. 
This is accomplished by punching the number k on the keyboard, 
transferring it to the lower dial (product register), and then proceed- 
ing as in finding the product nx. The result nx + k is then read 
from the lower dial. An extension of this procedure is especially 
useful in a series of computations where k and n are constant and 
various values are assigned to x. To describe the procedure, sup- 
pose it is required to' calculate the successive values of 12 + 6x for 
x = 5, 7, 15, 12, etc: The number k = 12 is first registered on the 
lower dial, then the factor n = 6 is placed on the keyboard, and by 
turning the crank forward five times to make the first value of x = 
5 appear on the upper dial, the result 12 + G X 5 appears on the 
lower dial. Instead of clearing the dial, the crank is now turned 
forward twice more to rebuild the value x = 5 into x = 7, and the 
result 12 + 6 X 7 can be read from the lower dial. In rebuilding 
x = 15 into x = 12 the crank is turned backwards. This procedure 
can be repeated until all the required values of 12 + 6x have been 
calculated. A process of this sort is called the continuous method of 
calculating. 

In most of the exercises in this course, the computations are not 
laborious and calculating machines are not required. However, if 
machines are available they may be used to advantage in Chapters 
IV and VI. The student who desires to develop skill on a calculat- 
ing machine should begin now to study an Instruction Book and 
practice the fundamental operations explained there. 

7. Collateral Reading. Perhaps no single textbook can meet all 
the needs of all students of statistics. There are several good books 
on elementary statistics which, although not fundamentally different, 

1 The beginner will probably wish to practice on a manually operated machine 
before attempting to use the high-speed electric and automatic machines. 
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present different points of view on certain topics and treat them with 
varying degrees of emphasis depending upon the field of major inter- 
est. At least some of the books listed below should be readily avail- 
able on the reserve shelf of the library. The list should be useful to 
those who wish to study more fully certain details in which they may 
be interested. 

1. Bivins — The Ratio Chart in Business. Codex Book Co. 

2. Burgess — The Mathematics of Statistics. Houghton Mifflin and Co. 

3. Camp — The Mathematical Part of Elementary Statistics. D. C. Heath 
and Co. 

4. Doming — Statistical Adjustment of Data. John Wiley & Sons, Inc. 

5. Freeman — Industrial Statistics. Wiley. 

0. Garrett — Statistics in Psychology and Education. Longmans, Green 
and Co. 

7. Glover — Tables of Applied Mathematics. Wahr. 

8. Haskell — Graphic Charts in Business. Codex Book Co. 

9. Mills — Statistical Methods , Revised. Henry Holt, and Co. 

10. Pearl — Medical Biometry and Statistics. W. B. Saunders and Co. 

11. Rider — Statistical M ethods. Wiley. 

12. Scarborough — Numerical Mathematical Analysis. The Johns Hopkins 
Press. 

13. Snedecor — Statistical Methods. Collegiate Press, Inc., Ames, Iowa. 

14. Treloar — Statistical Reasoning. Wiley. 

15. Walker — Elementary Statistical Methods. Holt. 

10. Yule and Kendall — The Theory of Statistics. Griffin and Co. 



CHAPTER 1 


FREQUENCY DISTRIBUTIONS 

1. Variables and Constants, A variable is a number symbol 
which may take on any value in a set of values which is called its 
range. A constant is a symbol whose range consists of only one value 
(in a particular discussion or situation). Letters toward the end of 
the alphabet, such as x , y } u, and v, are commonly used to denote 
variables. When a constant does not have a definite value such 
as 2, J, 7 r, and so forth, it is customary to represent the constant by a 
letter toward the beginning of the alphabet. 

Two famous constants are 

7T = 3.14159..., c= 2.71828... 

They occur in mathematics in many important, interesting, and 
even curious ways. As instances of the latter, the following ex- 
amples are noteworthy. 

e = 2 + ~j + ^ + ^jH , where nl = n(n - l)(n - 2) • • • 1. 

7 r 1 


The expression for e is called a convergent infinite series and that for 
7r/4 a continued fraction. 

2. Variates. In general, statistical data are obtained by taking 
observations or measurements on one or more variables. The values 
thus obtained are sometimes called variates .* For example, in com- 
puting the average monthly rainfall of a region the variable is rain- 
fall and the amount of rainfall for any month is a variate. Like- 

1 A somewhat different usage of this term is explained in Part II. 
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wise, if the bank clearings of the city of Madison are under considera- 
tion, then the variable is bank clearings, and the clearings for any 
specified interval are variates. If we denote a variable* by x then 
the N values which it takes on are denoted by x if x 2t ■ • *, Xn. 

Variates are of two kinds: continuous and discrete . Continuous 
variates are values of a variable which, theoretically, can be meas- 
ured to any degree of fineness, such as heights, weights, temperatures, 
ages. All the numbers between x ■= 0 and x = 1 form a set of con- 
tinuous variates. But if we restrict x to the rational numbers in 
this interval we have a set of separate and distinct values with 
“vacant” spaces between them. Values of a variable which arc 
thus restricted to particular values in order to have any meaning 
are called discrete variates. Other examples of discrete variates are: 
size of families, closing prices of stocks, “ successes ” in tossing a coin. 
A set of discrete variates is usually obtained by counting whereas 
continuous variates are usually obtained by measurement. 

3. Accuracy of Measurements. In the case of continuous vari- 
ates, the observed values as recorded can never be absolutely estab- 
lished by measurement. Thus, the height or weight of an object can 
be measured only approximately, the error depending upon the pre- 
cision of the instrument and the care and accuracy of the observer. 
However, it is not always necessary that measurements be recorded 
as accurately as it is possible to make them. Similarly, in the case 
of discrete variates the standard of accuracy used may be less 
than it is possible to obtain. In population statistics, for example, 
it may be sufficient to record the numbers to the nearest 
thousand, with three zeros at the end to fill out to the decimal point. 
Thus, 

City Population 

A 326,000 

B 729,000 

On the other hand, the exact numbef of students in a university 
might be required. The degree of accuracy needed is determined 
by the purpose of the investigation and it is limited by the closeness 
with which the variables can be measured. 

It follows, therefore, that the degree of accuracy in the final result 
of a problem involving computations is limited by that of the original 
data. Students sometimes carry results of problems to five or more 
decimal places when the original data do not justify more than two 
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or three decimal places. A table of measurements which constitutes 
the raw data for a statistical investigation should always specify the 
degree of accuracy in the readings. Thus, if monthly rainfall is being 
measured to the nearest hundredth of an inch, and one measurement 
seems to be exactly 5 inches, it should be recorded as 5.00 inches, with 
two zeros. A measurement that is merely recorded as 5 means it is 
correct to the nearest integer and its true value lies between 4.5 and 
5.5, whereas 5.00 means the true value is known to lie between 4.995 
and 5.005. The three digits in 5.00 are said to be significant. 

4. Necessity for Classification. After the data have been col- 
lected in any statistical investigation the first step has to do with 
introducing order in the raw material. Usually we have some hun- 
dreds of variates which have been recorded merely in the arbitrary 
order in which the observations or measurements happened to be 
made. But in order to analyze a series of variates so that intelligent 
judgments may be formed about it or that comparisons may be made 
between two series of variates, proper classification is necessary and 
of prime importance. 

Such classification is not always an easy thing to effect, because it 
is the one part of statistical methods for which no very definite rules 
can be given. Most people, until they have tried, imagine that to 
collect and arrange data in classes and in tables is a straightforward 
procedure involving no great technique or experience. Although 
much can be learned from a careful study of the illustrations and dis- 
cussions that appear in the following pages and the compilations of 
reputable bureaus such as the census volume i, nevertheless, experi- 
ence is the best teacher in effecting the most appropriate classification 
for any set of variates. 

6. Tabulation. In carrying out the process of classification, it 
becomes natural to arrange the results in tabular form, setting forth 
clearly and explicitly the statistics one wishes to present. In draw- 
ing up any table the following general rules should be observed: 

/ 

(1) Every table must be self-explanatory. To accomplish this 
the title should be short, but not at the expense of clearness. 

(2) Full explanatory notes, when necessary, should be incorporated 
in the table, either directly under the descriptive title and 
before the body of the table, or else directly under the form. 

(8) The columns and rows should be arranged in a logical order to 
facilitate comparisons. 
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(4) In tabulating long columns of figures, spaces should be left 
after every five or ten rows. Long unbroken columns are con- 
fusing, especially when one is comparing two numbers in a 
row but in widely separated columns. 

(5) If the numbers tabulated have more than three significant 
figures, the digits should be grouped in threes. Thus, one 
should write 4 G85 732, not 4685732. 

(6) , Double lines at the top (or at the top and bottom) may en- 

hance the effectiveness of a table. If the table nicely fills 
the width of the page, no side lines should be used. In such 
cases the omission of the side lines will have the tendency to 
emphasize the other vertical lines and cause the interior col- 
umns to stand out better. The columns should not be widely 
separated and the form of a narrow, compact table should 
have its side lines. 

The following points are particularly important in practical work: 

(7) Source of data should be included. 

(8) Units of the data presented should be clear. 

(9) Accuracy of transcription must not only be striven for but 
actually achieved. A reader who finds one error (even though 
this be the only one) is likely to disparage the whole table. 


Table 1 — Grades op 100 Students in Freshman Mathematics 


75 

86 

66 

86 

50 

78 

66 

79 

68 

60 

80 

83 

87 

79 

80 

77 

81 

92 

57 

52 

58 

82 

73 

95 

66 

60 

84 

80 

79 

63 

80 

88 

58 

84 

96 

87 

72 

65 

79 

80 

86 

68 

76 

41 

80 

40 

63 

90 

83 

94 

76 

66 

74 

76 

68 

82 

59 

75 

35 

34 

65 

63 

85 

87 

79 

77 

76 

74 

76 

78 

75 

60 

96 

74 

73 

87 

52 

98 

88 

64 

76 

69 

60 

74 

72 

76 

57 

64 

67 

58 

72 

80 

72 

56 

73 

82 

78 

45 

75 

56 


6. Frequency Distribution. From the standpoint of a mathemati- 
cal analysis of statistics, the most important form of tabulation is 
the so-called frequency distribution. Rough data do not present 
any clear ideas of description unless they are organized and condensed 
in a systematic way. We therefore partition the raw data into 
classes of appropriate size, showing the corresponding frequency of 
* variates in each class. When any set of statistics jp systematically 
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arranged in this way it is called a frequency distribution. For ex- 
ample, upon an examination of the raw data of Table 1, it is difficult 
to state any very definite conclusions as to whether these grades rep- 
resent preponderantly good students or poor ones. The frequency 
distribution of Table 2, however, does give us more precise infor- 


Table 2 — Frequency Table of 100 Grades 


Class Limits 

Tally Marks 

Frequency 

30-39 

// 

2 

40*49 

/// 

3 

50-59 

W-Mf / 


60-69 



70-79 

-UH- JW VW -Urt- -w -w // 


60-59 

JHT -H+r -Hft -w/- 

25 

90-99 

-U/f- // 

7 

Total 


too 


mation. We see at a glance that there were 32 students with grades 
between 70 and 80, and that all hut 16 had grades of 60 or above. In 
Table 3, the confusion of detail is still more apparent. The corre- 
sponding frequency distribution is given in Table 4. 

The width of a class is called the class interval, and in general 
the successive class intervals should be of equal width. The mid- 
value of such an interval is variously called the class mark, mid- 
value, central value. The width of a class interval is therefore 
seen to be the common difference between two consecutive class 
marks. It is also the difference between the lower (or upper) 
limit of two successive classes. Thus, in Table 4, the class inter- 
val is half an inch and the successive class marks are 0.245, 0.745) 
etc., inches. 

7. Class Intervals. Grouping variates into the most appropriate 
number of classes is a matter of judgment. The choice of intervals 
to be used in tabulating any particular set of variates depends upon 
the nature and characteristics of the data and the purpose for which 
it is to be used. In the case of discrete variates, the unit is a natural 
interval and sometimes it is satisfactory. (See Tables 10 and 11.) 
However, for both discrete and continuous variates the following 
conditions should guide the choice: (a) We desire to be able to treat 
all the values assigned to any one class, without serious error, as if 


12 Frequency Distributions I 

Table 3 — Monthly Rainfall at Iowa City, 1890-1925 
Year Jan. Feb. Mar. Apr. May June July Aug. Sept. Oct.’ Nov. Dec. 

1890 2.75 0 75 1 80 1.83 2 20 7 99 0.30 2.29 1.44 2.11 1.56 0.31 

1891 1.49 1 30 4 41 1.11 4 46 2.80 3.01 3.45 2.33 1.63 2.93 2.72 

1892 1 46 1.23 3 15 4 30 9.23 8.29 6.20 2 50 1.18 1.02 1.38 2.84 

1893 1 18 1.75 2.82 4 37 1.79 3 01 3.56 1.64 3.07 1.98 1.75 1.52 

1894 1.95 l 64 2 03 2 72 3.09 2 •*() 0.90 2 40 4 96 2 30 1 80 0.98 

1895 2 37 0.04 1 25 1 66 4 26 I 10 10.10 1.77 3.43 1.38 1.78 2.84 

1896 0 70 1 51 0 92 5 11 4 10 1 86 7.04 2.44 1 82 2 74 1 16 0.55 

1897 3.66 1 30 2.07 4 60 3 11 2.38 3 83 1.85 3.54 0.33 1.98 2.48 

1898 4.62 1.15 3.02 2 89 4 80 3 26 2 27 2 85 2 54 4 38 1 10 0.53 

1899 0 59 1 82 1.43 3 23 9 49 4 50 3 78 2 39 0.93 1 66 1.15 1.93 

1900 0 73 2.20 3.32 3 31 4.31 2.18 5 25 6 27 4 35 3 61 1 43 0.75 

1901 1 07 1.97 3.62 2 36 1 54 3 33 1 29 0 66 2 56 1.78 0.79 2.34 

1902 1 29 0.85 1 29 1 91 3 75 7 46 0 89 10 91 5 87 3.12 2.25 2.21 

1903 0.67 1.03 1 86 3 11 6 90 1.95 4 76 3 45 5.38 3 60 0 97 1 27 

1904 1 74 0.84 2.73 5 49 2.68 2 14 2.40 3 93 3 12 1 59 0 25 1 96 

1905 1 22 1 90 2 28 3 36 5 37 6 68 3 59 2 62 1.54 5 36 2.92 1.04 

1906 2 51 1.73 2 25 1 83 2.33 3.64 1.42 5 34 0 89 1 48 3 08 1.64 

1907 2 12 0 22 1.59 1.58 5 47 6 04 9 21 2 98 2 85 0 86 1.07 0 53 

1908 0.32 2.08 2 94 2 78 7.78 2.87 5.40 7 47 1.82 1.99 1 84 0.43 

1909 1 97 1.09 2 00 7.21 4.40 4 58 5 75 1 88 2.43 1.59 4 88 2.52 

1910 1 79 0.39 0.28 2.56 3 57 0.98 2.22 4.98 3.87 0 57 0.69 0.46 

1911 0 87 4 82 1.30 3 02 4 74 2 98 3.70 4.27 5.07 2.78 3.01 2 29 

1912 0 26 1.21 2.30 3 50 2 88 2.60 3.60 3 62 2.67 3 54 1 11 0.75 

1913 1 19 1.42 2 69 1.83 6.91 6 28 0.39 2.97 3.19 3.66 0.46 1.02 

1914 1.28 0 93 2 63 2.37 4 87 5.32 1.53 2.99 7.97 1.65 0.37 1.89 

1915 2.15 2.42 0.92 0.65 7.65 4.33 8.11 1.80 9.31 1.84 1.80 0.80 

1916 3.18 0.59 5.00 1.83 5.99 3 92 1.57 2.83 3.49 3.19 1.42 1.15 

1917 1.09 0.19 2.19 3.43 7.33 6.49 2.84 2.79 6.23 2.28 0.30 0.57 

1918 1.10 1 46 0.33 3.43 6.22 8.36 ^87 6.72 2.00 2.05 2.10 1.62 

1919 0.08 2.63 2.65 4.28 4.49 7.07 1.03 2.67 5.10 4.01 3.84 0.61 

1920 0.84 1.33 4.22 4 75 3 76 2.86 2.79 2.90 1.20 0.98 1.80 2.45 

1921 0.35 0.49 2.46 6.20 4.44 2.46 3.59 8.61 7.83 2.47 0.74 3.19 

1922 1.11 1.46 2.18 3.49 5.52 0 28 6.46 1.03 2.91 1.06 5.28 0.49 

1923 1.09 0.67 4.83 0.86 2.63 6.21 2.37 4.01 9.27 2.35 1.13 0.73 

1924 1.35 0.83 2.10 1.09 1.69 8.71 3.67 5.67 2.60 1.64 0.93 1.75 

1925 0.29 1.04 0.99 3 07 1.06 5.61 3.63 3.14 5.59 3.90 1.00 1.66 
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they were equal to the class mark for that interval; e.g., as if all 
23 items in the first class of Table 4 were exactly 0.245 inches, etc. 
(6) For convenience and brevity we desire to make the interval as 
large as possible subject to the first condition. These conditions will 
generally be fulfilled if the interval is so chosen that the whole num- 

Table 4 — Frequency Table op Monthly Rainfall at Iowa City, 

1890-1925 


Clans Interval 

M id-x 

Frequency 

0.00- 0.49 

0.245 

23 

0.50- 0.99 

0.745 

42 

1 00- 1.49 

1.245 

58 

1 50- 1.99 

1.745 

62 

2.00- 2.49 

2.245 

49 

2.50- 2.99 

2.745 

47 

3.00- 3.49 

3 245 

32 

3 50- 3 99 

3 745 

27 

4.00- 4.49 

4 245 

18 

4.50- 4.99 

4.745 

15 

5.00- 5 49 

5.245 

14 

5.50- 5.99 

5 745 

7 

6.00- 6.49 

6.245 

10 

6.50- 6 99 

6.745 

5 

7.00- 7.49 

7.245 

6 

7.50- 7.99 

7.745 

5 

8.00- 8.49 

8.245 

3 

8 50- 8.99 

8.745 

2 

9.00- 9.49 

9.245 

5 

9.50- 9.99 

9 745 

0 

10.00-10.49 

10.245 

1 

10.50-10.99 

10.745 

1 

Total 


432 


ber of classes lies between 10 and 25. A small number of classes 
may “ cover up ” too much detail whereas a large number may 
reveal too much detail for one to comprehend readily (which is 
just the objection to the table of original data). A preliminary 
inspection of the data should accordingly be made and the highest 
and lowest values selected. Dividing the difference between these 
by the tentative number of classes, we have our approximate value 
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Table 5- 



Frequency Distributions 

Monthly Rainfall at Des Moines, 1890-1925 

i 

Year 

Jan. 

Feb. 

Mar. 

Apr. 

May June 

July Aug . Sept. 

Oct. 

p 

Nov . 

Dec. 

1890 

2.62 

1.17 

0 

.91 

0 

.78 

3 

.00 

4 

.91 

1 

10 

3 

.35 

1 

.57 

4 

.48 

0 

.74 

0.11 

1891 

1.82 

1 13 

2 

25 

2 

12 

3 

29 

5 

60 

2 

78 

4 

.22 

1 

64 

2 

11 

1 

.34 

1.54 

1892 

1.60 

1 35 

2 

.47 

3 

36 

8 

77 

3 

41 

8 

.64 

2 

45 

1 

.12 

2 

.54 

0 

76 

1.95 

1893 

0.56 

1.28 

1 

15 

5 

61 

o 

.84 

4 

69 

3 

55 

1 

60 

1 

.33 

0 

.22 

1 

. 51 

1.30 

1894 

1.09 

1 39 

1 

78 

1 

.70 

1 

.41 

1 

67 

0 

.29 

1 

89 

4 

.46 

2 

24 

0 

.99 

1.15 

1895 

1 30 

0 00 

0 

50 

3 

.41 

2 

86 

5 

.26 

3 

10 

3 

.57 

3 

.20 

0 

.29 

0 

.85 

1.86 

1896 

0 60 

0.79 

1 

.24 

3 

.47 

6 

50 

2 

69 

8 

.15 

5 

.49 

3 

.61 

2 

.69 

1 

.10 

0.85 

1897 

2.02 

0.71 

2 

.13 

7 

37 

2 

31 

3 

15 

2 

88 

l 

77 

1 

56 

0 

85 

0 

.34 

1.98 

1898 

1.59 

0.82 

1 

.35 

2 

.64 

4 

22 

6 

85 

1 

86 

1 

.09 

1 

.91 

3 

.56 

1 

87 

0.57 

1899 

0.29 

0.57 

1 

.04 

2 

.22 

6 

.71 

3 

53 

3 

.20 

3 

. 53 

1 

17 

0 

.59 

1 

.76 

2.12 

1900 

0.20 

0 50 

3 

07 

3 

.82 

4 

.76 

4 

89 

5 

.15 

8 

02 

3 

.66 

3 

.08 

0 

96 

0.35 

1901 

1.01 

1.11 

3 

.02 

2 

.26 

1 

40 

2 

.41 

1 

72 

0 

.67 

2 

60 

2 

14 

0 

.40 

1.03 

1902 

0 91 

0.52 

1 

. 15 

1 

.55 

4 

69 

7 

27 

5 

95 

7 

82 

5 

03 

3 

.70 

1 

.65 

1.77 

1903 

0 20 

1 12 

1 

09 

1 

64 

0 

64 

3 

06 

3 

62 

6 

.72 

1 

.62 

1 

32 

0 

31 

0.09 

1904 

1 22 

0.22 

1 

20 

5 

48 

3 

16 

2 

OS 

6 

94 

2 

60 

1 

. 95 

1 

50 

0 

.06 

2 02 

1905 

1.08 

l 00 

2 

.16 

3 

29 

4 

44 

5 

73 

4 

53 

5 

21 

3 

17 

3 

.64 

2 

.34 

0.55 

1906 

2.07 

0.86 

1 

84 

2 

96 

2 

21 

3 

80 

2 

67 

4 

69 

3 

24 

1 

18 

2 

29 

1.46 

1907 

0.87 

0 93 

1 

.18 

1 

.48 

2 

97 

4 

13 

10. 

20 

5 

03 

2 

40 

1. 

70 

1. 

12 

1.01 

1908 

0.46 

1.15 

1 

43 

2 

.69 

9 

89 

5 

93 

1 

56 

6 

54 

0 

.94 

3 

68 

0. 

95 

0.31 

1909 

1.61 

0.90 

1 

56 

5 

14 

4 

24 

7. 

01 

4 

41 

0 

14 

2 

06 

2. 

89 

3. 

71 

2.32 

1910 

1.72 

0.20 

0 

33 

1 

13 

3 

26 

3. 

11 

0. 

86 

2. 

40 

3. 

82 

0. 

68 

0. 

53 

0.20 

1911 

0.84 

2.91 

1. 

14 

4. 

23 

2. 

44 

0. 

75 

1. 

16 

1. 

82 

7. 

68 

2. 

61 

1. 

22 

3 18 

1912 

0.53 

1.86 

2. 

87 

2 

75 

5 

62 

2 

60 

3 

07 

3 

52 

4. 

20 

3. 

75 

1. 

11 

0.30 

1913 

1.10 

0 65 

3. 

03 

3. 

41 

5 

06 

3 

52 

1. 

05 

3 

44 

2. 

65 

2. 

67 

1. 

03 

1.05 

1914 

0.85 

1 24 

1 

18 

1 

52 

4. 

83 

3 

89 

1 

22 

1. 

77 

4. 

81 

3. 

57 

0. 

35 

1.28 

1915 

1.96 

3.20 

1. 

16 

1 

36 

8. 

21 

3. 

60 

9. 

39 

1. 

71 

4. 

51 

0. 

43 

1. 

24 

0.65 

1916 

2.66 

0.61 

0. 

60 

2. 

44 

3. 

87 

2 

42 

1. 

50 

2. 

62 

1. 

72 

2. 

11 

1. 

46 

0.65 

1917 

6.53 

0.52 

2. 

30 

5 

52 

3. 

94 

8. 

16 

1* 

58 

1. 

82 

1. 

99 

0. 

92 

0. 

21 

0.88 

1918 

0.78 

1 45 

0. 

29 

1 

81 

5. 

87 

5. 

63 

1. 

18 

2. 

54 

0. 

91 

3. 

81 

2. 

10 

1.35 

1919 

0.08 

3 00 

3. 

67 

5. 

30 

2. 

96 

7. 

36 

2. 

68 

2. 

19 

7. 

47 

2. 

20 

3. 

84 

0.93 

1920 

0.44 

0.74 

3. 

92 

4. 

09 

3. 

14 

1. 

25 

5. 

66 

2. 

11 

4. 

44 

1. 

89 

1. 

63 

1.38 

1921 

0.59 

0.92 

1. 

07 

3. 

72 

3. 

62 

4. 

66 

2. 

49 

6. 

63 

7. 

16 

1. 

51 

0. 

35 

0.80 

1922 

0.85 

0.64 

2. 

25 

2. 

84 

6. 

87 

1. 

63 

7. 

13 

6. 

63 

3. 

00 

3. 

41 

2. 

54 

0.25 

1923 

0.88 

0 36 

4. 

34 

1. 

76 

4. 

78 

4. 

95 

0. 

78 

5. 

34 

5. 

17 

1. 

10 

0. 

55 

0.61 

1924 

1.02 

1.98 

3. 

10 

0. 

78 

1. 

26 

9. 

30 

0. 

98 

4. 

15 

3. 

47 

0. 

77 

0. 

53 

1.62 

1925 

0.23 

0.50 

0. 

88 

1. 

64 

0. 

77 

6. 

40 

2. 

21 

4. 

79 

3. 

75 

3. 

22 

0. 

32 

1.67 
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of the interval. After a little preliminary reconnoitering an appro- 
priate number of classes and their limits can be determined. Thus, 
in Table 3, the highest value noted was 10.91 and the lowest 0.08 
(verify). The difference between these is 10.83, which suggests that 
if we took 20 classes we would have approximately a half inch as the 
width of a class interval. This, however, assumes we would start 
with 0.08 as our lower limit, which would give us awkward figures as 
limits. Therefore, our judgment suggests it would be better to start 
with 0 and continue by half-inch intervals as far as is necessary to 
take in the range of the given variates. We have estimated it will 
take approximatey 20 of these; actually it turns out to be 22. This 
number of intervals and their width is consistent with the general 
conditions (a) and (6) given above. On page 1C are given some 
supplementary rules which in general are helpful in making a fre- 
quency distribution. 

8. Distinction between Class limits and Class Boundaries. The 

pairs of numbers written in the column of classes of a frequency dis- 
tribution are the lower and upper class limits , sometimes called open 
class limits. For instance, 1.00-1.49 are the limits of the third class 
of Tabic 4. When the measurements of Table 3 were made, readings 
were recorded to the nearest hundredth of an inch. Thus, a measure- 
ment which was more than 1.485 and less than 1.495 was recorded 
as 1.49. Likewise, if a measurement was more than 0.995 but less 
than 1.005, it would be recorded as 1.00. Therefore, the third class 
of Table 4 includes all measurements more than 0.995 and less than 
1.495. These values are then the true or doted limits of the third 
class and are known as class boundaries or end values . A class bound- 
ary is the value halfway between the upper limit of one class and the 
lower limit of the next class. For example, the upper boundary of 
the fourth class of Table 4 is 1.995 which is the lower boundary 
of the fifth class. If we denote the variate values by x, the 
following table illustrates these remarks for the first five classes of 
Table 4. 


Class Limits 

End-x 

Mid-x 

0 . 00 - 0.49 

0.495 

0.245 

0 . 50 - 0.99 

0.995 

0.745 

1 . 00 - 1.49 

1.495 

1.245 

1 . 50 - 1.99 

1.995 

1.745 

2 . 00 - 2.49 

2.495 

2.245 


The width of a class interval is the same, however, whether the 
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classes are expressed in terms of class limits or class boundaries, being 
the difference between the beginning of one class and the beginning 
of the next class. Similarly, the class mark as the mid-point of the 
interval is unaffected. Thus, for the class limits 1.00-1.49, the 
class mark is J(1.00 + 149) = 1.245; for the corresponding class 
boundaries, the class mark is £(0.995 + 1.495) = 1.245. 

The distinction between class limits and class boundaries is an 
important one in plotting graphs, but in tabulating it is the class 
limits that should be expressed. 

9. Rules for Making a Frequency Distribution. 

(1) Determine the range of the table by finding the difference be- 
tween the highest value and the lowest value among the items. 

(2) Determine the number of equal parts into which the range 
shall be divided. The size of the class interval and the num- 
ber of intervals depend upon the size and nature of the distri- 
bution. (Table 1 contains rather fewer classes than is usually 
desirable but an interval of 10 units is quite conventional in 
students' grades. An interval of 5 would be used if grades 
of A, A — , B, B — , etc., were given instead of A, B, etc.) In- 
tervals of 0.5, 1, 2, 3, 5, 7, or 10 are the most common. 

(3) Arrange a sheet with three headings: class interval, tally 
marks, frequency. 

(4) Read off the items in the raw table and for each one record a 
mark, as shown in Tabic 2. 

(5) Write the sum of the marks in each row in the frequency col- 
umn. The sum of the frequencies should, of course, equal the 
total number of variates. 

10. Cumulative Frequencies. The frequencies with which we have 
been concerned may be called absolute frequencies to distinguish 
them from two other kinds which will be mentioned in this course; 
namely, cumulative frequencies and relative frequencies. The first 
of these will be considered here. 

Sometimes a statistical investigation is concerned with the number 
or percentage of variates which are “ less than ” or “ more than ” 
a given value. This is frequently the case in educational tests and 
in wage or salary statistics. Our chief interest Jn such cases may be 
the accumulated frequency of the several class intervals up to some 
class boundary. Hence we are led to form a cumulative frequency 
table. Such a table is built up by successively adding the several 
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(absolute) frequencies; thus: f h + />, fi +/ 2 +/s, etc., as illus- 
trated in Table 7, where the data of Table 6 are used. We shall use 
N to denote the sum of all the frequencies. 


Table 6 — Distribution of Intelligence Quotients (IQ’s) of 905 School 
Children from 5 to 14 Years ok Age. (Derived from 
L. M. Terman, The Measurement of Intelligence) 


IQ 

Number of 

55- 64 

3 

65- 74 

21 

75- 84 

78 

85- 04 

182 

95-104 

305 

105-114 

209 

115-124 

81 

125-134 

21 

135-144 

5 


The cumulative frequency (cum /) at any class is the total (abso- 
lute) frequency up to the upper boundary of that class. This is the 
reason for placing the cum f entries opposite the end-x values and on 
lines between the mid-x entries. Thus, in the cum /column of Table 
7, three students had IQ’s less than 64.5, 24 less than 74.5. etc. The 


Table 7 — Cumulative Distribution of IQ's (Table 6) 


Class Mark 
Mid-x 

Frequency 

s 

Upper Boundary 
End-x 

Cum f 

Cumf 

N 

59.5 

69.5 

79.5 

89.5 

99.5 

109.5 

119.5 

129.5 

139.5 

3 =/i 

21 =/ 2 

78 

182 

305 

209 

81 

21 

5 

54.5 

64.5 

74 5 

84.5 

94.5 

104.5 

114.5 

124.5 

134.5 

144.5 

0 

3 - A 

24 = /i +/ a 
102 

284 

589 

798 

879 

900 

905 = N 

! 


entries in the column headed ( cum f)/N give the percentages of the 
total frequency which are less than the values of the end-x column. 
Thus, from this column in Table 7, we can readily see that 88% of 
the children had IQ’s less than 114.5 and only 11% less than 84.5. 
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Table 7 is known as a “ less than ” table. One could of course 
cumulate the frequencies from the bottom of the table, getting a 
“ more than ” distribution. The cum f column would then give the 
number of children whose IQ’s arc more than the values at the lower 
boundaries of the several class intervals. 

The inverse operation to cumulating the frequencies is called 
“ differencing ” and is usually denoted by A (delta). If S denotes 
any scries of values, then AS denotes the results obtained by sub- 
tracting the first value of S from the second value, the second from 
the third, etc. Differencing a column of cumulative frequencies 
obviously gives the absolute frequencies. Differencing a column 
of (cum f)/N values gives the / /N values. 

Exercises 

1. What is the width of the class interval and the values of the class marks 

in Table 2? 

2 . Tabulate the grades of Table 1 , using class intervals of 5 units. 

3 . With reference to Table 3, is it easy to answer such questions as the following: 

(a) In how many instances are the monthly rainfall between 2 inches and 
3 inches? 

(b) In how many instances was the rainfall less than 5 inches? 

(r) What was the smallest monthly rainfall recorded? 

(d) What per cent of the total measured between 5 inches and 10 inches? 

(e) What measurement is the most common? 

4 . Refer to Table 4 and then answer the above questions. 

6. Using your own judgment as to the most appropriate fclass interval, make 
a frequency distribution of the monthly rainfall for Des Moines from 
1890 to 1925 (Table 5). 

6. For Tabic 6 state the class boundaries (end values) and the class marks. 

7 . Difference the cum f column of Table 7. 

8 . Read the following references: 

(а) Mathematics Essential for Elementary Statistics — Walker, Chapter II. 

(б) Standards and Requirements in Statistics — Belcher. Journal American 
Statistical Association , vol. 21, p. 424. 

p 

11. Additional Distributions. The following distributions which 
will be referred to in subsequent chapters will serve as illustrative 
and laboratory material. They are not chosen on account of the 
importance of the data but merely to exemplify methods. 
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Table 8 — Distribution of Lengths of 

995 Telephone Calls. 

Time in Seconds 

Time 

X umber of Calls 

0-99 

1 

100-199 

28 

200-299 

88 

300-399 

180 

400-499 

247 

500-599 

260 

600-699 

133 

700 -799 

42 

800-899 

11 

900-999 

5 


(For future reference: 2 = 477.3 sous., a = 148.5 secs.) 

Table 9— Distribution of Weight in Founds Among 
1000 8-Yemi-Old Glasgow Schoolgirls 


•itjhl (m 

id-nil ues) 

F requenry 

29 

5 

1 

. 33 

5 

14 

37 

5 

56 

41 

5 

172 

45 

5 

245 

19 

5 

263 

53 

5 

150 

57 

5 

67 

61 

5 

23 

65 

5 

3 
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Table 11 

Twelve dice were thrown 4096 times; a throw of 4, 5, or 6 points being reckoned 
a success. The following distribution was recorded: 

Successes Frequency 


0 

0 

1 

7 

2 

60 

3 

198 

4 

430 

5 

731 

6 

948 

7 

847 

8 

536 

9 

257 

10 

71 

11 

11 

12 

0 


(For future reference: x — 6.139, a ~ 1.712) 


Table 12 — Frequency Distribution of the Wei cuts of 1000 Male 
Students (Original Measurements Male to Nearest Half Pound) 


Class 

Class 


Cumulative 

Founds 

M ark 

Frequency 

Frequency 

90- 99.5 

94.75 

2 

2 

100-109.5 

104.75 

21 

23 

110-119.5 

114.75 

104 

127 

120-129.5 

124.75 

196 

323 

130-139.5 

134 75 

248 

571 

140-149.5 

144.75 

197 

768 

150-159.5 

154.75 

133 

901 

160-169.5 

164.75 

47 

948 

170-179.5 

174 75 

25 

973 

180-189.5 

184.75 

14 

#► 

987 

190-199.5 

194.75 

7 

994 

200-209.5 

204.75 

4 

998 

210-219.5 

214.75 

0 

998 

220-229.5 

224.75 

0 

998 

230-239.5 

234.75 

1 

999 

240-249.5 

244.75 

1 

1000 


(For future reference : $ = 138.65, cr = 18.03, a t = J94) 
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Table 13 — Distribution of Span (Central Values) in Inches Among 2000 
Adult Males (Original Measurements to the Nearest Inch) 


Span 

Frequency 

Span 

Frequency 

58.5 

1 

71.5 

217 

59.5 

2 

72.5 

176 

60.5 

1 

73.5 

132 

61.5 

6 

74.5 

82 

62.5 

7 

75.5 

48 

63.5 

22 

76.5 

20 

64.5 

55 

77.5 

16 

65.5 

111 

78.5 

12 

66.5 

140 

79.5 

3 

67.5 

182 

80.5 

1 

68.5 

229 

81.5 

2 

69.5 

265 

82.5 

1 

70.5 

263 

Total 

2000 


The following references arc; recommended to those* who desire some distri- 
butions which may be mope interesting in themselves: 

(a) Per cent Distribution of Deaths in Each Age* Period, by Specified Cause's. 
White Males and White Females, United States, 1042. Source: Metro- 
politan Life; Insurance Company, Statistical Bulletin , October 1945, p. 7. 
(ft) Age of American Military Leaders. Source: Metropolitan Life Insur- 
ance Company, Statistical Bulletin, June, July, August, 1945. 

(c) Employment Status of the Population by Age* and Sex. Source: Popu- 
lation , Third Series, The Labor Force, Table 5, 16th Census. 

(d) Distribution of Population by Age. Source: Statistical Abstract , 1943, 
p. 24. 
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GRAPHICAL REPRESENTATION 

1. The Function Concept. Variables which are linked or related 
in some way are encountered in various fields of human experience. 
Several variables may be linked but we shall, for the present, con- 
sider the simple case where only two variables are involved. For 
example, the two related variables may be time and population, 
variate and frequency, rate of interest and accumulated principal, 
age and insurance premium. The primary purpose of a graph is to 
show diagrammatically how the values of one of two linked variables 
change with those of the other. One of the most useful applica- 
tions of the graph occurs in connection with the representation of 
statistical data. 

Underlying the intelligent use of graphs is the concept of function , 
which is a fundamental notion in mathematics and its applications. 
The mathematical meaning of function i's a technical one, entirely 
different from the ordinary meaning. The student usually meets 
the word for the first time in algebra, when a linear or quadratic 
expression is spoken of as a function of x. An example is the equation 

y = P( l + x ) 2 . 

The expression on the right is the function of x (P being constant) 
and for convenience it is denoted by the single letter y. Here x is an 
interest rate and y denotes the amount to which P dollars will accu- 
mulate in two years at x% per year. 

The statement that y is a function of x is written symbolically in 
the form 

y = /(*)#. 

This implies that a value of the function y is determined when a value 
is assigned to the variable x . For this reason, x is called the indepen- 
dent variable and y the dependent variable. In place of / other letters 
may be used. Thus, any one of the symbols 

g{x) } h(x), Fix), 4(x), 

and so on, denotes a function of x . The same symbol may be used 

22 
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to denote different functions in different problems, but different 
symbols are required to represent different functions in the same 
problem or discussion. 

Examples: 

fU) = fir 2 - 3x + 2, 

<t>(x) = Ke-*. 


Any mathematical expression involving a variable a; is a function 
of x . However, the word is often used to designate a relation that is 
completely divorced from any equation or expression. The central 
idea conveyed by this more general meaning is that of a correspond- 
ence between values of // and values of x. The following definition is 
the result of a development over a long period and its formulation is 
due to Dirichlet, a famous French mathematician (1805-59). 

Definition. Lei there he a set of values assumed by the inde- 
pendent variable x. If to each x in the set, there corresponds one or 
more values of y, then y is said to be a function of x in the set. 

It should be observed that this definition 1 is freed from any notion 
of the necessity of specifying the mathematical relation between x 
and y. We may or may not know the special method by which the 
correspondence is set up. A mathematical formula or equation be- 
tween x and y may not even exist. A function may thus be 
considered as being equivalent to a table in which one may look up 
any x of the set of the definition, and find the corresponding y. 

Much of the data in statistics comes under this general definition 
of function. Thus, in the following table, net earning is a function 


of the year, whether or not there 
is any equation defining that 
functional relationship. 

Here the function is defined 
only for the indicated points 
which correspond to the values 
given in the table. The straight 
lines are drawn to help the 
reader visualize the relative posi- 
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tions of these values and not to represent the function at inter- 


mediate points. They may, however^ be thought of as a first 


1 A classical example is the function which is defined for the infinite set of 
numbers from aj = 0toa;=*ltobe unity for all rational numbers and zero for 
all irrational numbers. 
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approximation to the unknown function between the given values. 
Such a representation of the function could not, of course, be 
assumed in the case of discrete variates because then the function 
is discontinuous and does not exist except for the given values. 

Referring again to the above definition, if there is only one value 
of y corresponding to each value of x then y is called a single-valued 
function of x\ otherwise y is said to be a multiple-valued function of 
x . Child weight would be an example of a multiple-valued function 
of age, being different for different children. The weight of a par- 
ticular child would be a single-valued function of age. For the most 
part we shall be concerned with single-valued functions. 

2. Charts. A detailed study of the technique of representing data 
by broken lines, by charts or bar graphs, etc., will not be undertaken 
here. It is a rather specialized and non-mathematical subject, and 
the student interested in plain-scale cartography can readily find 
books on the subject which are very readable . 1 (A discussion of 
ratio charts is given in Chapter VII.) 

1000 

800 

g 600 

0 

3 

£ 400 

Ll 

200 

2 3 4 5 6 7 8 9 10 11 12 

I Frequency Polygon for the Distribution of Table 10 
n Frequency Polygon for the Distribution of Table 11 

Fia. 1 — Frequency Polygons for DistiObutions of Discrete Variates 

3. Frequency Polygon. We present now a discussion of the 
graphs that are used in connection with frequency distributions. A 

1 For example, 

(a) Graphs: How to Make and Use Them — H. Arkin and R. Colton. 2nd ed. 
Harper. 

(b) Engineering and Scientific Graphs for Publication. American Standards 
Association, New York. 

(c) Reference 8 in our Introduction. 
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distribution of discrete variates may be represented graphically by 
plotting the points (x h fi) 9 (z 2 ,/ 2 ), • • • (x*,/*), and drawing a broken 
line through them. Such a graph is called a frequency polygon be- 
cause it is a polygon formed by connecting the tops of a series of 
ordinates whose lengths are proportional to the various frequencies 
and whose abscissas correspond to the variate values of the distri- 
bution. Figure 1 will serve as an illustration. For a table of dis- 
crete variates the function exists only for the given values. Like- 
wise, its graph is discontinuous. The straight lines connecting the 
points serve merely to “ carry the eye,” thus giving a better idea of 
the shape and position of the distribution. 



4 . Histogram. If the frequency distribution is one of grouped 
variates (discrete or continuous) it is better to use some form of 
graphical representation which recognizes the fact that the several 
measurements in a table do not lie precisely at the class marks but 
are spread out over the intervals of which the class marks are centers. 
This may be accomplished through the use of a histogram. A histo- 
gram is a series of rectangles erected at the class boundaries with 
altitudes proportional to the respective class frequencies, and cen- 
tered on the class marks. Thus the frequencies are represented by 
areas. (See Figure 2.) If the bases are all of unit length then the 
altitudes are also equal to the frequencies. The histogram is an 
important and useful graphical device for representing frequency 
distributions. 

5. Frequency Curves. The shape of the distribution may be 
emphasized by constructing a continuous frequency curve such that 
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the areas under^the curve beween the ordinates at the upper and 
lower boundaries of the various rectangles will equal approximately 
the areas of the corresponding rectangles. Thus, in Figure 3, the 
area of all the rectangles represents the total frequency 1000, and the 
area of the three rectangles labeled A, B, C represents the number 
of individuals weighing between 139.75 pounds and 169.75 pounds. 
The dotted line represents roughly the frequency curve correspond- 
ing to the histogram. 

Representing each class frequency of a distribution of continuous 
variates by a rectangle is equivalent to saying that we realize that 

250 

200 

&150 


= ioo 

50 


Frequency Distribution of the Weights of 1000 Male Students (Table 12) 

Fia. 3 — Histogram .and Frequency Curve for a Distribution 
of Continuous Variates 

the function exists for points other than the class marks, but we do 
not know what it is for these, points, and so as a first approximation 
we assume that the variates are uniformly distributed over each 
interval, which is equivalent to regarding them as concentrated at the 
class marks. If the class intervals were made smaller and smaller 
and at the same time the number of variates were proportionally 
increased, .the upper bases of the rectangles would approach more 
and more the frequency curve which represents the ideal or theoreti- 
cal mathematical function relating frequency with variate value 
for the given distribution. 

A frequency curve is often drawn for convenience in describing 
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the properties of an observed distribution, although strictly speak- 
ing, the concept of a frequency curve is applicable only to an infinite 
11 universe ” of continuous variates. The data at hand are supposed 
to be a “ sample ” from the universe represented by the frequency 
curve. 

The more common types of distributions may be represented by 
bell-shaped curves which are either 1 symmetrical or skew. For ele- 
mentary purposes it is sufficient to consider frequency distributions 
as of these two general types. In passing, we may also mention two 
other types which are known as .[-shaped and U-shaped. For ex- 
amples of these typos see Yule and Kendall, An Introduction to the 
Theory of Statistics, Ch. VI. 



Fig. 4 — Ogive for Table 7 

6. cfgives. The graphs of cumulative frequencies are called 
ogives . The ogive for Table 7 is shown in Figure 4 and is constructed 
by plotting the points (54.5, 0), (64.5, 3), etc., as in algebra, and 
joining them with straight lines. 

The student should observe that while cum f is a function of x it is 
defined for the end-x values only. Occasions will perhaps arise when 
we desire the rr-value corresponding to some intermediate cum f 
value, say 453 in Figure 4 Conversely, we might wish to know the 
cum f value for some intermediate axvalue, say at x = 97. Strictly 
speaking, we do not know the answer in either case, inasmuch as we 
do not know how the IQ's are distributed over the interval. Per- 
haps all the individual values in the interval 94.5-104.5 (say) are 
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less than 97; perhaps none are. The fairest assumption we can make 
is that they are uniformly distributed throughout the interval. This 
means graphically that we represent cum f over each interval by a 
straight line, as is done in Figure 4. Wc may now interpolate under 
this line for intermediate values. This is “ straight line interpola- 
tion 99 and is what the student uses when he interpolates in logarithms. 

More refined methods exist for interpolating values of a function 
between the observed values but their study constitutes a separate 
branch of mathematics beyond the scope of this course. It should 
be observed that the straight line used here is a first approximation 
to the unknown function, and not merely a device to carry the eye 
as in the case of a frequency polygon for a discontinuous distribution 
of discrete variates. 

7. Relation of Cumf to Areas. The sum of the frequencies ( cumf ) 
up to any value of x means, graphically, the sum of the areas of 
the rectangles of the histogram up to that value. Thus in Figure 4, 
the ordinate erected at x = 84.5 represents the sum of the frequencies 
(3 + 21 + 78) = 102 (Figure 2). If a frequency curve represents 
the distribution, then cum /, corresponding to any value of x , is the 
area under the curve up to that value. Thus, in Figure 3, cum f 
corresponding to x = 139.75 is approximately the area under the 
smooth curve up to x = 139.75, and the total area under the curve 
is cum f = N. 


Exercises 

1. (a) If/(x) = 2x 3 exhibit /(--z). Give the value of /(3), of /(— 2). 

( b ) Let /(:r) denote a given function which is defined for all real values of 
x under consideration so that if c is any admissible number /(r) is 
defined. What is the graphical meaning of /(c)? 

2. If ^(z) = Ke -* 1 , («) show that <f>(x) = <f>{— x); (6) give the value of <f>( 0). 

3. If h(x) = ax 2 4 bx + r, and h(x) = /i( — x), show that 6 = 0. 

4. If/U) = •&*, show that /(a) X f(v) = f(u 4- v). 

6 . If g{x) = log{ (1 - x)/(\ 4 z) }, sho^ that g(u) 4 g(v) - 
g\{u 4 *0/(1 4 uv)}. 

6. Make a histogram for the data of Table 4. 

7. Same as exercise 6 for Table 8 or 9. 

8. Construct an ogive for the cumulative frequencies given in Table 12. 

9. Find the cumulative frequencies and construct the ogive for Table 9. 

10. For further discussion of ogive curves and their uses, read the following 
references: 

(а) Elements of Statistics — - Davis and Nelson, pp. 23-28. 

(б) The Mathematics of Statistics — Burgess, pp. 61-72* 
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1. It was pointed out in Chapter I that classification of the vari- 
ates of any long series is the first step necessary to overcome the 
confusion of detail in the original observations, and to make compari- 
sons with other distributions possible. In Chapter II graphical 
methods were studied which describe, to some extent, the shape and 
position of the distribution. Although these methods are helpful, 
their contribution is largely qualitative. 

It is desirable to formulate quantitative descriptions for character- 
izing a distribution, and as an aid in this direction averages are very 
useful. They are also called measures of location. An average is a 
quantity locating a central value of the distribution. In a sense, 
it is a typical value of the whole set of variates, although it is not 
necessary that it actually have the value of one of the items of the 
set it represents. There are five averages in common use. These 
are: arithmetic mean , mode } median , geometric mean , and harmonic 
mean . The means and median are most frequently used although 
the arithmetic mean is by far the most important in general statis- 
tical work, and the others are of service in special cases. We will 
consider them in the order named. First, however, it will be desir- 
able to discuss certain symbols and notation which will facilitate the 
development of formulas. 

2. Notation. If x denotes a variable, then x h x 2 , • • •, Xn, are 
general symbols for the values which x may take. When we are con- 
cerned with a sum like the following, 

#1 + #2 + £.3 + #4 +•••+**+•'•+ *EjVj 

it is customary to designate it by placing the Greek capital letter 
(sigma) before the general term, thus 

N 

JjXi = Xi -r X2 + • • • +£» + ’ • ■ + 

The symbol is a sort of mathematical verb and the notation 
written above and below it may be called adverbs. Mathematicians 

29 



30 


Averages 


in 


call 52 an operator and speak of the “ adverbs ” as limits. When 

52 is placed before any quantity, it means, “ add up all quantities 

like • • • which are formed by giving i the values of every positive inte- 
ger from i = 1 to i = N f inclusive.” Thus if x< stands for “ variates ” 
in Table 1, Xi refers to the first value 75, x 2 refers to the second value 

80, etc., and Xn refers to the last value 56. Here N = 100. Hence 
100 

the compact notation 52 x » denotes the sum of all the variates in 

i = 1 
AT 

Tabic 1. The symbol 52 x * i s read, " the summation of x-sub-i, i 

»=»i 

varying (or running) from one to N” The subscript i is called the 
index of summation. Any letter may be used as an index but it is 
conventional to use i or j. Also the upper limit may be denoted by 
any letter but we shall use N to denote the total number of variates 
(some of which may be alike) in a set. 

If a variable x is to take on the particular values, 1, 2, 3, etc., 
instead of the general values Xi, x 2y x*, etc., then x itself becomes the 
index of summation and we write x = 1 underneath 52* Thus 

& =1+2 + 3 + ..- + AT, 

ar-l 

AT 

52* 2 = 1 + 2 2 + 3 2 + • • • + N*. 

x-l 


Frequently the index of summation is understood from the context 
and the notation at the top and bottom of 52 ma y ^> e omitted if no 
ambiguity results. 

It is imperative that the student master, as soon as possible, the 
significance and utility of the 5!) notation. 

Illustrations: 

N 

1. 5^3 Xi = Sxi + 3X2 H + 3 xn * 

i-i 

= 3(xi + x 2 + • • • + xn). 

2. 5>i + o) = (Xi + c) + (x 2 + c) + (xj + c) 

+ (x 4 + c) + (x B + c) 

= ( Xi + X 2 + Xi + Xi + x 6 ) + 5c. 

4 

3 . y,xdi = xj/i + X2/2 + X3/3 + X4/4. 
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4 . fjcMi = x&i + Xij/t + x 3 ya + xq*. 

3-1 

N 

6. 2 >* - 1 * + 2 * + 3 * + • • • + N*. 

U*1 

The following simple theorems will be useful in our work. 
Theorem I. The summation ^ of an algebraic sum of two or more 
terms is the same algebraic sum of the of these terms taken separately . 

In symbols: 

N ' AT AT AT 

Y^(xi + yi— z.) = J^x, + Jj/, - ]jZi. 

1 = 1 i = 1 1 = 1 1 = 1 

Theorem II. A constant factor may be removed from under the 
summation sign and written outside as a factor. Thus, 

N N 

= c%2z<. 

i = 1 i = l 


Proofs: It is left as an exercise for the student to prove these two 
theorems by expanding the expressions. 

N 

Theorem III. If the expression under is a constant c , the expanded 

i=i 


result is Nc. 


Examples: 

AT 

1. £c = c + cH + c = ATc. 

»-l 

AT AT AT 

2 . 51 (x* - c) = £x» - by Theorem I 

i=i i=i » — l 

at 

= "" by Theorem III. 

i = l 

The above theorems hold also if we replace the notation 


£x< by 2Zx, etc. 

i«l x =1 

The next two theorems have to do with summing integers. The 
numbers used in counting, 

1 , 2 , 3 , 4 , 5 , • • • 

are called integers or natural numbers. 
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Theorem IV. 


In 8yihbola: 


The sum of the first N integers is 
N(N + 1) 

2 

^ N(N + 1) 

2 > 5 


This result follows from the fact that the integers form an arithmetic 
progression. 

Theorem V. The sum of the squares of the first N integers is 


In symbols: 


N(N + 1)(2N + 1) 

■ • 

6 

N(N + I)(2JV + 1) 
= « 


Proof: Let us take the identity x 3 — (x — 1)* = 3x 2 — 3x + 1, 
and sum each side for x — 1 to N. Thus, 

Zfx 3 - (x - l) 3 ] = £[3r 2 - 3x + 1], 

X =1 x--l 

Applying Theorems I— III to the right member we have 

- (x - l) 3 ] = 3l> 2 - 3£x + N. 

x = 1 x = l x-1 

Performing the indicated sum in the left member, we have 



Therefore N» = 3£> 2 - 3^* + N. 

Hence, using Theorem IV and simplifying, 

A . 2iV 3 + 3 N(N +1) -2N 
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whence 

3. Arithmetic Mean. The arithmetic mean of a set of variates is 
defined as the sum of the variates divided by their number. We are 
thinking now of a set of ungrouped variates, like that of Table 1. If 
we use the symbol x to represent the arithmetic mean of the N 
variates x l} X 2 , x* • • •, xn, then 

z = (zi + x 2 + x s H h xn), 


or using the more compact notation of the preceding section, we have 

1 N 

a) * - 


Each item in the set i& thus represented in the arithmetic mean in 
proportion to its magnitude. 

As an illustration, it is easily verified that for the set of grades given 
in Table 1, 


7207 

100 


72.67. 


Computing the mean 1 strictly according to definition (1) may be 
called the serial method to distinguish it from other methods which 
will be presented. This definition is applicable when N is so small 
that a grouping of the variates into a frequency distribution is not 
feasible. 

If x refers to the integers from 1 to N their mean is 

1 N 

(la) 1 - j,Tx. 

4. Weighted Arithmetic Mean. It will be noticed that several of 
the grades given in Table 1 are alike. For example, 80 occurs seven 
t.imAs It should be evident that the same result would be found for 
the mean if, instead of summing the individual values, each value was 
first multiplied by the frequency with which it occurs and all such 
products were then added. In general, if the values x h x t , •••,£* 
occur with corresponding frequencies /j, / 2 , • • •, /*, respectively, 

1 When there is no ambiguity, the arithmetic mean is often referred to as the 
mean. 
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where /i + /a H h /* = N, it follows that 

_ + £ 2/2 + • • • + Xkfk 

fl + f 2 + *••+/* 

or, in shorter notation 

(2) x = jz'EifiXi, where N = £/* 

iV 1 1 

When obtained in this way, x is generally called a weighted arith- 
metic mean. The term originated in experimental science where 
some readings which have been made under more favorable conditions 
are “ weighted ” according to their reliability or importance. When 
the weights have been chosen, they become, essentially, frequencies. 

If the x’s are added individually, the s become unity, and equa- 
tion (2) reduces to (1). The student should notice that, for the 

k N 

same data, Jj/iX, is numerically equal to E x *'- He should also 
1 1 

observe that N refers to the number of variates in the set (some of 
which may be alike), whereas fc refers to the number of different values 
of x in the set and hence to the number of products of the form xtfi 
where /, is the number of times x» occurs. In the following example, 
N = 8 and k = 4. 

Example. For the values 6, 8, 7, 6, 5, 7, 6, 5, 

8 

= xi -f- xt "f - X* -f" X 4 + "b x? -f" ■■6 + 8 + 7 + 6 + 5 + 7 

t- 1 

+ 6 + 5 = 50. 

4 4 

E/<Xi = fixx + + fsX& + /4X4 = 2-5 + 3-6 + 2.7 + 1-8 = 50. E/i - 8. 

»-i »=i 

By either method, 2 = 50/8 = 6.25. 


Exercises . 


1. Write in expanded form: 

(«)3 £»/<; (f>) £x<y ( ; 


»»1 


t- 1 


2. Write in expanded form: 


(o) Z/<; 
1 


nt±nt 

(b) E fH 

*-n,+ 1 


(c) Ete-*)/<. 

x-1 


ni m+ni 

(c) E^tfi + E X *fi 

♦ -1 1-ni+l 


3. Express 2(c) as a single summation, if »i + n 2 « k. 
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4, Write in the abbreviated form, using : 

(а) xifi -f x a / 2 H — • 4- Xkfk. 

(б) (x, - 2)/i + (z 2 - 2)/ 2 + ■ • ■ + (x k - 2)/*. 

(c) ^ [(xi - 2)% + (*, - 2) 2 / 2 + •■• + (**- 2) 2 /*]. 

6. Prove: 

A k lc 

(а) X>. + l)*f, = La, 2 /, + 2L*i/( + AT. 

1 1 l 

(б) L-Kz - l)p = L*(* ~ 1)P- 

x —0 x-2 

6. Compute the value of exereise 1 (c) for the example in §4, using the following 
form: 

X, /, c Xi - X ) {Xi - x)fi 

5 2 -1.25 -2.50 

6 3 

7 2? ? 

_8_J 

L(*. - 2)/. = r 

Distinguish between J^Xiy t and ' Wriic in expanded form. 

(а) Express in £ notation: Each different variate is multiplied by its own 
/ and the sum of the results is divided by AT. 

(б) Give word statements of the expressions in Exereise 4. 

(c) Express the general polynomial of degree n in x, 

Oo + OlX + fl2X 2 4- • • • 4 On 

in 2} notation. 

9* Using the identity 

derive the result 


by a method analogous to the proof of Theorem V. 

10. (a) Express in abbreviated notation: The sum of the squares of the x’s 
divided by the square of their sum. 

(6) If x refers to the integers from 1 to N, evaluate your answer to (a) in 
terms of N. 

(c) Show that the mean of the first N integers is (N 4- 1)/2. 

6. Arithmetic Mean from Frequency Table. The variates in each 
class interval of a frequency distribution are assumed to have the 
value of the class mark for that interval. Therefore, we may use 



7. 

8 . 
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formula (2) to find the mean of a frequency distribution. In this 
case, Xi represents the mid-value of the ith class interval, /< Jbhe corre- 
sponding frequency, and k the number of intervals; i running from 1 
to k. The method of applying (2) is illustrated in Table 14 from the 
data of Table 2. 


Table 14 


Class 

Interval 

Class Mark 

x 

Frequency 

S 

Product 

J* 

30-39 

34.5 

2 

69.0 

40-49 

44.5 

3 

133.5 

50-59 

54.5 

11 

599.5 

60-69 

64.5 

20 

1290.0 

70-79 

74.5 

32 

2384.0 

80-89 

84.5 

25 

2112.5 

90-99 

94.5 

7 

m .5 

Totals 


V 

II 

§ 

Z/X = 7250.0 


3 


7250 

100 


= 72.50. 


If we denote the class interval by c then it is obvious that c = 10 in Table 14. 


In this connection it is interesting to note that our result here 
differs very little from the true value 72.67 and therefore our assump- 
tion that all values in a given class may be taken as the class mark 
seems to cause little error in the result obtained for the mean. This 
can be proved mathematically (under certain assumptions) and will 
be referred to later. 

6. Translation of Axes; Deviations. It is frequently useful to 
employ the methods and results of geometry in connection with the 
problems of statistics. Foremost among these methods is the repre- 
sentation of numbers by points on a line; an origin and a unit of 
measure having been chosen, a coordinate is assigned to each point on 
the line. When a frequency distribution is represented by a graph, 
we have seen in Chapter II that the variate values are used as abscis- 
sas or measurements along the x-axis. The mean is therefore the 
point on the x-axis whose coordinates are (3, 0). Its position may 
be emphasized by drawing a vertical line through this point, but it is 
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the horizontal distance of the point from the origin and not the 
vertical line which represents graphically the mean. 

In discussing the variates we may often work with smaller numbers 
by changing the origin of reference. If new axes, x f y', are taken 
parallel to the old axes, xy , with positive directions preserved, the 
axes are said to be translated from one position to the other. A trans- 
lation of axes corresponds to a transformation of coordinates. Thus if 
we let 

x' = x - x 0 , y' = y - ?/o 

the origin is translated to the point (x 0 , yo ) . Since the variates are 
denoted by x we arc concerned here only with the transformation 
x f = x — Xq which translates the origin to the point (x 0 , 0). The 
variates referred to a new origin are often called deviations . In 
particular if we translate the origin to the mean by letting 

x’ = x — x, 

then for a frequency distribution the deviations are the values 
obtained by subtracting x from each of the class marks. Thus, 

Xi = Xi — x 
x 2 = x-> — x 


Xk = x k - x. 

The units of measurement remain unchanged. Figure 5 shows the 
two systems when the axes are translat ed to ( x , 0). Obviously, any 



variates that are larger than x will be positive in terms of x f and any 
variates smaller than x will be negative in terms of x\ 
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7. Properties of I. There are two important properties of x 
which may be stated in the following theorems: 

Theorem VI. The algebraic 1 sum of the deviations of all the variates 
from their arithmetic mean is zero. 

Proof: Let x' represent a deviation from the mean. Multiplying 
each different deviation by the number of times it occurs and adding 
these products we have, 

J^fiXi = £/<(*. - x) 
i l 

k k 

= 2 ZfiXi - by Theorem I 

i i 

* k 

= 2^/^' — by Theorem II. 

l x 

k k 

Recalling from (2) that f x x % = Nx, and that = N , we have 

1 1 

(3) £/>(*, -x) = Nx- xN = 0. 

i 

Theorem VII. If the variates are referred to a new origin xo and 
expressed in units of c by means of the transformation 

(4) u — , (c j^O), 

then the old mean , x, is related to the new mean , u, by the following 
formula: 

(6) x = cu + Xq. 

Proof: From (4), 

(4a) x = cu + Xo 

and substitution of this value for x in definition (2) gives 

1 k * 
x = — + ^o). 

is i 

By Theorems I and II this equals 



1 That is f taking account of signs. Some of the deviations will be positive and 
some negative. 
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But the first of these expressions is, by definition, c times the mean 
value of u, and the second is, from (2), simply x 0 . Therefore 

X = CU + Xq. 


This is an important relation and its derivation should be mastered. 
Observe that the size of a w-unit will be c times as large as the size 
of an x-unit. 

Corolla ky. If the mean of the deviations of the variates from any 
arbitrary number , x 0 , is found and added algebraically to x 0 , the result is 
the mean x. In symbols , 

1 k 

(6) x = — £/,(Xi - x 0 ) + x 0 . 

i\ i 


The proof follows from (4) and (f>). 

In (5) and (6), :r 0 may be regarded as a provisional mean, and the 
first, term in the right members may be? 1 bought of as the correction 
to be added algebraically to 1 he provisional mean in order to get the 
true mean. 

8. Short Methods of Computing 3c. In certain cases, the method 
of computing the mean by (2), as shown in Table 14, can be simpli- 
fied by use of Theorem VII. 

Case l (class intervals equal), [f the class marks are equispaced, 
let c equal the class interval and choose Xo as one of the class marks, 
usually the? one opposite the largest frequency. From (4), x 0 becomes 
the origin of u , because when x = .r 0 , u = 0. 

The method of using (5) is illustrated In Table 15, page 40. Here 
c = 10 and we choose x {) = 74.5, so (4) becomes 

x - 74.5 


Substituting the given values of x in this relation we get the values in 
the u column. So in running the fu column, small values of u are 
multipliers of the larger values of /. Then 


1 _ -20 

u = — — —.2, 

100 ^ 100 ’ 


so from (5), 


$ = 10(—.2) + 74.5 = 72.5%. 
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It should be evident that the final value obtained for the mean is 
independent of the choice of the arbitrary value x 0 . This* choice is 
only a rough guess and it is really immaterial which of the given 
values is selected as x 0 , except that the nearer it is to the mean the 
lighter will be the calculations to follow. A check on the arithmetic 
may, therefore, be effected by selecting a different provisional mean. 


Table 15 — Mean of 100 Grades Using Class Interval as Unit 


X 

u 

i 

fu 

34.5 

-4 

2 

- 8 

44.5 

-3 

3 

- 0 

54.5 

-2 

11 

-22 

64.5 

-1 

20 

-20 

74.5 

0 

32 

0 

84.5 

1 

25 

25 

04.5 

2 

7 

14 

Totals 


100 

-20 


This indirect method is sometimes called coding because the vari- 
ates are coded to another scale in which it is easier to compute the 
mean. Formula (5) is the relation, then, for transforming the mean 
from one scale to another. 

If one’s statistical interests are limited to computing means, then 
(2) cannot be improved upon if calculating machines are to be used. 
It should be understood, however, that techniques must be devel- 
oped now for subsequent purposes. The indirect method is part of 
a pattern which is useful in later chapters. From this standpoint, 

k 

one should practice using it at this stage when N = ^/i is large 

* l 

and the x’s are equispaced. 

Case II (class intervals unequal ). Occasionally a frequency dis- 
tribution is encountered in which the variates are not equispaced; 
it is then usually best to take c = 1 (unless the x’s have a common 
factor c) and be content with whatever simplification results from a 
suitable choice of x 0 . This is equivalent to using the above corollary. 

In Table 16, we choose x 0 = 200 and are thus able to simplify the 
work a little. (See page 41 .) 
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Table 16 


X 

f 

u 

Uj 

106.12 

' 7 

- 93.88 

- 667.16 

191.83 

14 

- 8.17 

- 114.38 

246.48 

32 

46.48 

1487.36 

283.63 

49 

83.63 

4097.87 

257.65 

65 

57.65 

3170.75 

294.51 

54 

94.51 

5103.54 

222.53 

35 

22.53 

788.55 

71.43 

14 

-128.57 

-1799.98 

Totals 

260 


12076.55 


u = - 200 ) 


12076.55 

260 


= 46.448 


2 = u + x 0 = 246.45. 


9. Geometric Explanation. Let us consider further the relation 
between the variables x and u, defined by the expression. 


(4) 


u = 


X — Jo 




Xo 


X-X c 


A geometric explanation will be 
helpful. 

Graphically, the x values are 
distances along the x-axis meas- 
ured from zero as origin. Like- 
wise Xo is some point on the 
x-axis at a distance of x 0 units 
from zero. If now the points 
representing the x values are 
measured from. x 0 as origin 
they are denoted by x — x 0 . 

(See Figure 6.) Thus if x 0 = 24, a value which is 36 with reference 
to the origin of x will be 12 with reference to x 0 ; likewise a value 
x = 18 becomes x — x 0 = —6 when referred to x 0 as origin. It 


Fig. 6 
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should be noted that x — xo is in the same units as x. Thus if x is in 
inches, x — xo will also be in inches. But ( x — x 0 )/l2 would then 
be in feet. Instead of dividing by 12 suppose we divide by c. Then 
(x — xq)/c will be in units of c whatever c may be. It is convenient 
to denote the resulting values by a different letter, say u. There- 
fore the numerator of (4) changes the origin of reference but does 
not affect the scale of measurement. The denominator changes the 
scale, there being c of the x units in one of the u units. Relation (4) 
has this generalized meaning apart from statistics. Mathematical 
notation is applicable to many different fields of knowledge. A rela- 
tion like (4) which occurs in physics is C = (5/9) (F — 32); it con- 
nects temperature on the Centigrade and Fahrenheit scales. 

When (4) is applied to a frequency distribution it is convenient to 
select Xo as one of the mid-x values and to take c as the width of the 

class intervals. Under Case I, the 
mean is found with reference to x 0 
and in units of c. This is the mean, 
u, of the numbers representing the 
various class intervals' weighted 
with the corresponding frequencies. 
After this mean is computed it may 
be converted back into units of x 
by multiplying by c, and then re- 
ferred to the origin of x by adding x 0 . (See Figure 7.) Hence we 
have x — cu + x 0 . Thus we arrive at the same result as that 
obtained algebraically. 

If we had denoted the variates by y we could have used the relation 


u jn units of C 

f N 

— I 1 


CU in original units 


Fig. 


7 — If xo < f , cu is positive; if 
xo > 2, cu is negative. 


V = 


y - z/o 


corresponding to (4). Geometrically, this would mean a change of 
units and a translation of origin in the ^-direction. The relation 
corresponding to (5) would then be 

y = cu + y 0 

where 5 = ^ £,/>,. 

As the short-cut method is an important one, another illus- 
tration is given in Table 17 (based on Table 4). Here we take 
u = (x - 2.745)/0.5 = 2(x - 2.745). 
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Table 17 — Computation of Mean Monthly Rainfall at Iowa City 


1890-1925 


X 

/ 

u 

/« 

0.245 

23 

-5 

-115 

0.745 

42 

-4 

-168 

1.245 

58 

-3 

-174 

1.745 

62 

-2 

-124 

2.245 

49 

-1 

- 49 

2 . 745 * xq 

47 

0 

0 

3.245 

32 

1 

32 

3.745 

27 

2 

54 

4.245 

18 

3 

54 

4.745 

15 

4 

60 

5.245 

14 

5 

70 

5.745 

7 

6 

42 

6.245 

10 

7 

70 

6.745 

5 

8 

40 

7.245 

6 

9 

54 

7.745 

5 

10 

50 

8.245 

3 

11 

33 

8.745 

2 

12 

24 

9.245 

5 

13 

65 

9.745 

0 

14 

0 

10.245 

1 

15 

15 

10.745 

1 

16 

16 

Totals 

432 


49 


x. = 2.745 + 

(0.5) (49) 

432 



= 2.802 inches. 



10. Mean of Means. So far we have used subscripts to distin- 
guish between the variates within a set: x h x 2 , • • *, x N . By this time 
the student should be thinking easily in this notation so we may now 
state an additional use of subscripts. Instead of using x and y to 
distinguish between two sets of variates we may use *i and x 2 . Then 
to distinguish the variates within a set we would add a second sub- 
script, so for the Xi set the variates are 


*ii, *12, *13, • • • , *in x 
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and for the xt set the variates are 

X21, X22 1 X23 , • * * 9 X2 nj. 

These are read “ x two 000 /' etc., not “ x twenty-one,” etc. In the 
notation dealing with one set, x was a variable but xi, x 2 , etc., were 
constants. Now Xi and x 2 are variables and x\ 1 , x !2 , • • • , x 2 j, x 22 , ■ • • , 
etc., are constants. Thus xi and x 2 may denote the grades of two 
sections of mathematics in which there are n t and n 2 students respec- 
tively. Then the mean of the first set is 

1 ni 

(а) xi = — t 
and the mean of the second set is 

( б ) x-t = — Jjtu- 

M2i = l 


We will now state a useful theorem. 

Theorem VIII. If the mean of a set of ni variates is x x and the mean 
of another set of n 2 variates is x 2 , the mean x of the combined sets is 


(7) 


x = 


ftiXi + n 2 x 2 
N 


where N = n\ +71%. 

Proof : It is obvious from equations (a) and ( b ) that 


to 


nixi + n 2 x 2 = J^xu + ^T,x 2 i- 
1 1 


If x is allowed to stand for x t and x 2 in succession as shown in the 
table on page 45 then the right member of ( c ) may be written 

m+n« 

2 Xi which denotes the sum of all the variates when they are 
1 

combined into one set. If this latter sum is divided by the total num- 
ber of variates N the result is, by definition, their mean. Hence 


n\X\ + w 2 x 2 
n\ + w 2 


m to 

jL,Xu + ^X 2i 
_1 1 

ni + n 2 



We may express (7) in more compact notation as follows. 

12 2 

x = t: N “ £»* 

N i=i t-i 
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1 

*u 


*1 


2 

*11 


*1 


3 

*it 


*• 


• 

• 

: 

>*1 

* 


• 

ni 

*i»»i 








7 X 

n i + l 

*21 


*»1+I 


ni 4* 2 

*22 

! 

*»l+2 


m +3 

*28 


*»»l+> 


. 

. 

f*» 

• 


• 

• 

n i + n% 

* 2»»2 





ni 

71* 

ni +na 


^* 1 . + 2 > 2 i 

E *« 


»*1 


i = l 


This form lends itself to a generalization fur k sets so we have the 
following theorem. 

Theorem IX. The mean of a set of M variates which is composed of k 
subsets is 

(8) x = ^ZriiXi 

*v»=i 

where x% is the mean and is the frequency in the ith subset and 
N = ]£n<. 

t=*l 

Corollary. If n x = n is the same for all the sets , then N = kn and 

(8) reduces to 

(9) 2 = 1 2*- 

K i 

Exercises 

1. (o) Use (1), §3, to find the mean of (he following numbers: 18, 42, 23, 16, 

103, 61, 49, 95, 113, 10. 

(6) For the numbers in (a) verify that the sum of their deviations from their 
mean is zero. . What theorem does this exercise illustrate? 

2 . Find the deviations of the numbers in Ex. 1 from 50 and verify that the mean 

of these deviations added algebraically to 50 gives the mean of the numbers 
themselves. 

3. Prove: The sum of the deviations of the variates from their mean is zero. 

4 . Derive the relation 2 = cu + xq. 
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6. Find the arithmetic mean of the weights of 1000 students given in Table 12. 
Use (5). An s. 138.65 lbs. 

6 . Find the mean monthly rainfall at Dcs Moines from 1890 to 1925, using the 

frequency distribution which you previously made. Ans. 2.55 inches. 

7. Find the mean of the distribution of discrete variates given in Table 11. 

8. Prove the following theorem: The mean of a set of variates is unchanged if 

each variate is replaced by the mean of all the variates. 

9. (a) Prove expressions (8) and (9). 

(6) The mean grade of one class of 20 students is 76% and of another class of 
15 students is 80%. Find the mean of the two classes. 

10 . The record of freshman scholastic averages for a semester at a certain uni- 
versity were given as follows: 



n» 

Zi 

Men 

501 

3.550 

Women 

356 

3.639 


Find the mean grade for the entire claSvS. 

11 . Assume that the following fictitious data represent the earnings per week of a 
certain type of machine shop labor in Illinois establishments: 


Wage Group 

Frequency 

$00.0 

under $10.0 

50 

10.0 

20.0 

150 

20.0 

30.0 

400 

30.0 

40.0 

200 

40.0 

50.0 

160 

60.0 

80.0 

40 

Total 


1000 


•Class omitted. Note the different interval in the last class. 

The average earnings per week for this same type of labor in all other states of 
the United States where 9000 men are employed, not counting those in Illinois, arc 
$30.00 per week. 

Compute the arithmetic mean wage (a) for Illinois, ( b ) for the entire United 
States. 

Recompute the mean wage for Illinois in such a manner as to check, in the 
quickest and surest way, the accuracy of the result found in (a) above. 

12 . Find the mean of the following distribution: 


X 

/ 

47.5 

7 

48.1 

17 

45.9 

46 

44.0 

44 

40.7 

54 

41.6 

43 

38.0 

35 

33.2 

14 
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11. The Mode* That value of the variable which occurs most 
frequently is called the mode. Its chief service is in characterizing 
a type and it is the kind of average meant by such a phrase as the 
“ average man.” There is some difficulty in giving a precise defini- 
tion of the mode without more advanced mathematics. However, 
we may say that for a given grouping an approximate value, which 
we will call the empirical mode, is given by the class mark having the 
largest frequency. 1 Thus, in Table 17 the empirical mode is 1.745 
inches. 

12. The Median. Instead of finding the mean, suppose the N 
variates are arranged in the order of their magnitude. The median is 
defined as the value which is greater than half the variates and less 
than the other half. A more precise definition is as follows: 

Let x\ y X 2 , • • • , Xm be a set of real numbers, which may or may not 
be all different and suppose they are arranged in order of magnitude 
so that 

X\ ^ x 2 ^ x 3 S • • • ^ Xn- 

Whenever N is odd, N = 2k — 1 , the median is x kf the middle one of 
the x’a. If N is even, N = 2k, the median is not uniquely defined 
unless Xk = x k +i, in which case the median is this common value. 
Otherwise, the definition is satisfied by any value of x belonging to 
the interval 

Xk ^ X ^ X k +1, 

and the median is to this extent indeterminate. In this case it is 
conventional to take 

\(x k + a-*+i) 

as the median. 

Example. Find the median of the following set of numbers: 10, 6, 5, 25, 15, 18, 

20 . 

Arranging them in order of magnitude we find the median to be 15 (the mean is 
14.14). If we add another value, 37, to make N even, the median is J (15 + 18) = 
16.5 (the mean is 17). 

13. Median of a Frequency Distribution. Case I . For a fre- 
quency distribution of continuous variates, the median is defined as 
follows: 

Definition: The median is the value of x for which cum f = N/2. 
Given such a frequency distribution we may therefore find its 

1 Another method of computing the mode will be given in a later section. 
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median by forming a cumulative frequency table and interpolating in 
the end-x column for the value of x corresponding to N/2t 
The method should be clear from the following illustration. 



Find the median for the data of Table 2. 


Interval 

f 

End-x 

Cum f 



29.5 

0 

30-39 

2 

39.5 

2 

40-49 

3 

49.5 

5 

50-59 

11 

59.5 

16 

60-69 

20 

69.5 

36 

70-79 

32 

Md 

<-50 



79.5 

68 

80-89 

25 

89.5 

93 

90-99 

7 

99.5 

100 


Here, N/2 = 50. This value of cum f corresponds to a value of x 
in the interval 69.5-79.5. Therefore the median is 69.5 plus a frac- 
tion of the distance from 69.5 to 79.5. Thus, 


End-x 

Cum f 


' rf!" 69 - 5 

36 1 " 

D, 

'^Median 

*50_r 2 Di 


79.5 

68 


Assuming that the items in any class interval are uniformly distrib- 
uted over that interval, it follows that the partial differences are 
proportional to the total differences: d\/Di = d 2 /Z) 2 . That is, 

Median — 69.5 _ 50 — 36 
79.5 - 69.5 “ 68 - 36 - 
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whence, 

Median = 69.5 + 10 Q0 

= 69.5 + 4.4 = 73.9. 

This is called “ straight lino interpolation ” or “ interpolation by 
proportional parts.” The reason for these names is made clear in the 
following diagram. 



A ABC is similar to AAED 
AB BC 
AE ~ ED 


x = AB = 


AE-BC 

ED 


_ 10(50 - 36) 
68-36 

-ffl) 

= 4.4 


. /. Md = 69.5 + x = 73.9. 


The following formula may also be used to compute the median: 


Md = cXm + 
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where ^x m is the lower end-value of the median class, N is the total 
frequency, bfm the number of variates below the median dass, c the 
class interval, and f e the frequency of the median class. 

Case II. In the case of a set of discrete variates there may be no 
value in the set such that the number of variates which are larger than 
it is equal to the number less than it. Thus in Table 1 1 the values of x 
are integers and 35% of the throws yielded 5 or fewer successes and 
65% yielded 6 or more successes. Neither x = 5 nor x = 6, nor any 
integer, will exactly split the total frequency into two equal parts. 
Of course a formal application of the definition given in Case I will 
give a value of x for which cum f is N/2. The difficulty is not so 
much in the interpretation of the fractional result because the same 
objection could be cited against the mean. But the real difficulty 
lies in explaining interpolation in a discontinuous function. We 
cannot assume that the given frequencies arc distributed over the 
interval from one value of x to the next. Perhaps the best we can do 
in such cases is to make a statement similar to the one above for 
Table 1 1. At least such a statement serves to summarize the situa- 
tion without artificiality. 

14. Graphical Interpretation of Mean, Median, and Mode. The 
mean corresponds to the abscissa of the point known in mechanics as 
the centroid of area. If a thin, homogeneous plate of metal cut in 



the shape of a histogram is supported loosely on a horizontal axis 
through its centroid, the plate will have no tendency to rotate, what- 
ever horizontal direction this axis may assume. 

The median of a frequency distribution is the abscissa of a point 
through which a vertical line will divide the total area of the histo- 
gram into two equal parts. 

If a distribution could be represented by a smooth curve, then the 
mode is the abscissa of the highest point on the curve. 

Figure 9 shows the position of the three averages'm a moderately 
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skew distribution. If the distribution were perfectly symmetrical 
then all three of these measures of location would coincide. 

There is an interesting empirical relationship between the three quantities 
which appears to hold for unimodal curves of moderate asymmetry, namely, 

mean — mode = 3 (mean — median). 

It is a useful mnemonic to observe that the mean, median, and mode occur 
in the same order (or reverse order) as in the dictionary; and that the median is 
nearer to the mean than to the mode, just as the corresponding words are nearer 
together in the dictionary. 1 

15. Discussion. The student primarily interested in the use of 
these averages in practical statistics might reasonably inquire, 
“ Which of the three averages mentioned should be used in a given 
problem? ” The answer depends upon certain properties peculiar to 
each average and upon the nature of the data to be averaged. 

In most cases the mean is a distinctly superior average. It is 
rigorously defined, easily computed, and is most tractable in theoreti- 
cal discussions. 

When the median differs considerably from the mean it is likely 
that the median is the more typical value. The advantage of the 
median over the mean is recognized in at least three situations: 

(a) When occasional and unexpected values occur at the ends of 
the distribution. In such cases the mean would tend to distort the 
true representation of the typical value, being unduly influenced by 
the exceptional values. 

( b ) When the data are presented in a table left open at one or both 
ends. For example, suppose the registrar’s office of a university 
reports the[following distribution of grades as given in all departments 
for a semester: 


Below 60 

60-69 


80-89 

90-100 

215 

1060 

| 

1242 

506 


A cum f table may be formed and hence the median can be found 
without any more information about the values less than 60 . 

(c) When the observations cannot be measured numerically but 
can be ordered. 

The mode is best adapted to situations where the word “ usual ” 
would be appropriate. Unless a large number of items are con- 

1 M. G. Kendall — The Advanced Theory of Statistics , vol. I, p. 35. Uppincott. 
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sidered the mode can have little practical meaning. It is the appro- 
priate average in certain questions of marketing because manufac- 
turers are interested in the type or quality which is usually in demand. 
Or again, in an investigation concerning wages and cost of living, the 
mode would reflect the average situation. Also, in a mathematical 
treatment of frequency curves the concept of the mode is very useful. 

Sometimes a distribution has more than one mode, although this is 
usually due to heterogeneous material. In this course we will be 
concerned only with unimodal distributions. 

The above remarks about the appropriateness of various averages 
are made from the standpoint of describing and condensing the data 
per se. A few remarks from a different point of view should perhaps 
be added here. In the theory of sampling, which deals to a large 
extent with estimating from a sample certain constants in the parent 
universe, it is shown that the mean has definite advantages. The 
mean is much more efficient 1 than the median, for example, in esti- 
mating the corresponding average in the universe (except in a special 
case when the universe is an unusual type). 

For a more complete treatment of the applicability of these three 
averages, the student is referred to the following books: 

1. Theory of Statistics — Yule and Kendall, Ch. VII. 

2. The Mathematics of Statistics — Burgess, Ch. V. 

3. Mathematical Statistics — Camp, p. 40. 

Exercises 

1. State what the empirical mode is in each of Tables 8 to 13. 

2. Explain why the median is found from interpolating in the end-x column 

and not the rnid-x column. 

3. Read one or more of the references in §15 and write an essay on the ad- 

vantages and limitations of the mean, median, and mode. 

4. Find the median IQ for the data in Table 7. 

6. Find the median for the data in Table 9. 

16. Geometric Mean. The geometric mean of a set of N 
positive values is the Ath root of their product. Thus, the geometric 
mean (G.M.) of two values is the square root of their product, of three 
values the cube root of their product, and in general for the N values 
Vh 2 / 2 , • • • , Vn> t 

(10) G.M. = [vi • 1/2 • 1/3 • • • }/ n ] N * 

1 See Economic Control of Manufactured Products — W. A. Shewhart, p. 280. 
D. Van Nostrand Co. 
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Equation (10) lends itself to the use of logarithms and frequently they 
greatly facilitate the computation of G.M. From (10) we have 

(11) log G.M. = [log yi + log j/» H (- log y w ]. 

Therefore the arithmetic mean of the logarithms of a set of values 
is the same as the logarithm of the geometric mean of the values 
themselves. 

Examples: Find the geometric: mean of 

(а) 3, 6, 12, 24, 48. 

Solution: 

G.M. = [(3 6 ) (2 10 )] l/5 = (3)(2 2 ) = 12. 

(б) 7.96, 13.82, 22.95, 35.34. 

Solution: 

log 7.96 = 0.90091 
log 13.82 = 1.14051 
log 22.95 - 1.36078 
log 35.3 1 ~ 1.54827 
4 1-1.95047 
log G.M. = 1.23762 
G.M. = 17.28 

The geometric mean is the appropriate average when the data are 
limited at one end of the range and unlimited at the other, and there 
tends to be a constant rate of change from one y value to the next. 
This is characteristic of values which tend fo form a geometric pro- 
gression, i.e.j which tend to follow the simple exponential law 

(12) y = ar x . 

The student will recall from algebra that a geometric progression can 
be put in the form 


X 

0 1 2 ••• x 

yj 

a ar mr' 1 • • • ar x 


The value of any term in the y series is a function of the exponent of r 
since a and r are constants. The functional relationship is therefore 
represented by (12). 

The growth of many quantities in nature follows this law and it is 
sometimes called the law of natural growth. With x referring to 
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time, y may represent, for example, the population of a city, the 
enrollment of a school, the weight of a quantity, or the number of 
bacteria in a culture. The accumulated amount S, of P dollars 
invested at i rate of interest, compounded periodically for n periods 
also takes the form of (12), namely, 

S = P( 1 + i)«, 

where r is now (1 + i), a is P, and n and S are the variables 
corresponding to x and y. 

Thus, if $1000 increased at compound interest to $2150 in 31 years, 

$1 000 [ t $2150 

0 1 2 30 31 


the geometric average rate at which the money increased is found 
aa follows 


r" = (1 + 0» 


2150 

1000 


1 + i = (2. 15) 1/31 
= 1.025 

i = 2|%. 


Since there was an increase of — — = 115%, the arithmetic average 

1UUU 

115 

would be — = 3.7% which is also the simple interest rate, 
ol 

If y in equation (12) represents population, and we are given two 
values of y corresponding to two dates N years apart, the geometric 
mean enables us to find a fairer estimate of the value of y at the mid 
date than would be given by any other average. For example, 
suppose we are given that the population of a city was 2500 in 1920 
and 5000 in 1930. We wish to estimate the population in 1925 and 
to find the average annual rate of increase. If we are given no other 
information, our best estimate for 1925 is given by 

G.M. = ( 2/1 • y *)' 12 = (2500 X 5000) 1 ' 2 = 3535. 

The average annual rate of increase is obtained by solving (12) for r as 
follows: 

5000 - 2500r 10 
2 = r 10 . 
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Hence r = *s/2 = 1.0718 = 107.18%, so that the average annual 
rate of increase is 7.18%: It is now possible to estimate the popula- 
tion for any intermediate year. Thus, for 1928, we have from (12): 

y = 2.500(1. 0718) 8 = 4353. 


The geometric mean is also used in economics in averaging “ index 
numbers ” which are essentially the ratios of prices of commodities at 
one date to their prices at another date. In general it is the appropri- 
ate average when emphasis is on the rate or percentage of change 
rather than the amount. 

17. Harmonic Mean. Another average which has long been 
known and which is required in certain problems is the harmonic mean 
(H.M.). For the N positive values x h x 2 , • • •, x Nj it is defined as the 
reciprocal of the arithmetic mean of the reciprocals of the values. 
In symbols, 


(13) 


H.M. 


1 


N 


s( i + 1 

N \2i X2 


+ ••• + 



This measure is used in averaging ratios, such as rates and prices, when 
certain conditions are agreed upon. 

In the case of time rates, we have ratios between two quantities 
one of which is in units of time, which we will denote by t, and the 
other is in units of some element like distance or accomplishment or 
temperature, etc. Denote this second element, different from time, 
by d. Then we make the following observations: 

(a) A rate may be stated either in the form d/t or in the form t/d. 
Thus, a car which travels at the rate of 30 miles per hour may also be 
said to travel at the rate of 2 minutes per mile. In this illustration 
the second form is not the usual way of expressing the rate, but there 
are cases in which the form t/d is usual. When we say a man takes 
10 seconds to run 100 yards we are expressing his rate in time per 
unit of distance (t/d). 

(b) In averaging rates one should first decide whether d or t should 
properly be the basic or “ fixed ” element in the discussion. Occa- 
sionally there is a difference of opinion about which element should 
most appropriately be regarded as fixed. For example, suppose a 
rW of students has been given 15 minutes in which to work as many 
as they can of a given list of problems, and the number of problems 
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worked correctly by each student recorded. Some educational 
statisticians would say that time should be the fixed element be re and 
that number of problems solved (in a unit of time) should be the vari- 
able. Others would say that the number of minutes (t) a student 
required to work one problem is the proper variable and that 
a problem (d) should be regarded as the fixed element in the dis- 
cussion. 

In one case the rates are equally weighted in the sense of time 
and in the other case they are equally weighted in the sense of the 
element d . 

(c) r lhe harmonic mean of the rates expressed in the form d/t gives 
the same result as the arithmetic mean of the same rates expressed in 
the form t/d . This is evident from equation (13) if it is written in the 
form, 

1 = l_ 

H.M. N ^Xi 

and from the fact that rates in one form are merely the reciprocals of 
the same rates in the other form. 

As an illustration, let us consider three cars: 

A travels at the rate of 15 miles per hour (J mile per minute), 

I B travels at the rate of 20 miles per hour ( J mile per minute), 

C travels at the rate of 30 miles per hour (J mile per minute). 

But their rates could just as well have been stated as 

A travels at the rate of 4 minutes to the mile, 

II B travels at the rate of 3 minutes to the mile, 

C travels at the rate of 2 minutes to the mile. 

The harmonic mean of the rates as stated in I is 20 miles per hour; 
i.e., § of a mile per minute, and the arithmetic mean of the rates as 
stated in II is 3 minutes per mile or again, 20 miles per hour. 
(Verify.) 

The third observation, i.e., (c) above, suggests the following discus- 
sion. The arithmetic mean of the rates in I is 21 f m.p.h. and this is 
the harmonic mean of the rates as stated in II. 

The question arises, which is the correct average, 20 m.p.h. or 21 § 
m.p.h.? The problem is indeterminate until it is agreed whether 
time or distance is the fixed element. The correct average will differ 
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according to the condition agreed upon. This will be made clear in 
the following analysis. 

Case /. Let 

di 

Xi U 

denote the ith rate, i = 1, 2, • • •, n. Then the average rate is 

D = total distance _ hx i + hx > + • * • + t n x n 
T = total time t\ + t* + • • • + t n 


Condition 1. Let distance be the fixed element, i.e., let d be con- 
stant. Then d = tiXi, and U = d/xi. Therefore, the expression for 
average rate becomes 


^ X jXj 



nd 




which is the harmonic mean. 

Condition 2. Suppose t is the fixed element. Then ^ tjXj 
becomes t^Xi since t is a constant, and becomes nt . Hence, we 
have for the average rate, 

D £>< V 


which is the arithmetic mean. 

Case //. Let = U/di denote the ith rate. Then the average 
rate is 

T = total time _ 

D = total distance di 

Condition 1., Suppose d is the fixed element. Then U = dx% and 
d = U/Xi. Hence, we have 

T _ dZx< £»« 

D nd n 


Condition 2. Let t be fixed. Then di = t/xi and the average 
rate is ■ . 


T 

D 


nt 





58 


Averages 


m 


We therefore state the following rules for averaging rates: 

Rule 1 . The harmonic mean is used whenever the fixed eleipent is d 
and the rates are expressed in the form d/t , or when the fixed element 
is t and the rates are expressed in the form t/d. 

Rule 2. The arithmetic mean should be used when the fixed ele- 
ment is t and the rates are expressed in the form d/t y or when the fixed 
element is d and the rates are expressed in the form t/d. 

In the case of prices, which are of course ratios, a similar discussion 
holds except that now the unit of time is to be replaced by a unit of 
money. Therefore, prices are ratios between two quantities, one of 
which is in units of money and the other in units of some commodity 
or service. They may be stated as so much money per unit of com- 
modity (ra/c), or as so many units of commodity per dollar (c/m). 
Thus, if 100 bushels of wheat are exchanged for 75 dollars of gold, the 
price of the wheat in terms of gold is 75 100, or three-fourths of a 

dollar of gold per bushel of wheat. Contrariwise, the price of gold 
in terms of wheat is 100 -s- 75, or one and one- third bushels of wheat 
per dollar of gold. Thus, there are always two prices in any ex- 
change. 

The correct average will depend upon how the prices are stated and 
upon whether a unit of the commodity (or service) or a unit of money 
is the fixed element. 

The following papers in The Journal of the American Statistical 
Association are recommended: 

1. “ The Nature and Use of the Harmonic Mean ” — W. F. Ferger, 
vol. 26 (1931), pp. 36-40. 

2. “ Calculating the Geometric Mean from a Large Amount of 
Data” — Zenon Szatrowski, vol. 41 (1946), pp. 218-220. 


Examples 

1. In a certain factory a unit of work is completed by A in four minutes, by B 
in five minutes, by C in six minutes, by D in ten minutes, and by E in 
twelve minutes. What is their average rate of working? At this rate 
how many units will they complete in a six-hour day? 

Solution. The rates are expressed in the form t/d but it would seem appro- 
priate to regard t as the basic or fixed element since output per unit of 
time appears to be the important consideration here. So by Rule 1, 

5 

i + 1 + i + A + A 


H.M. = 
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that is, 

300 

H.M. = — = 61 minutes per unit. 

4o 

4(360)5 

In 360 minutes they will complete — = 288 units. 

25 

2 . A tourist purchases gasoline at three stations, as follows: 


Number of gallons of 
Station gasoline for $1.00 

A 5 

B 7 

C 6 


Here the prices are given in the form r/m and it would seem appropriate to regard 
gallon (c) as the fixed element and prices (m) per gallon as the variable quantities 
which arc to be averaged. Hence, replacing d/t by c/m and “ rates ” by “ prices ” 
in Rule 1, we are led to find the harmonic mean. 


H.M. 


1 + i + i 

630 

— gals, jwr 81.00 


$107 


per gal. 


Exercises 

1. (a) The arithmetic mean of a set of 30 numbers is 82. What is tho sum of 

these numbers? 

(6) The G.M. of ten numbers is 1.40. What is the product of these ten 
numbers? 

2. In chemistry a student was graded 65 in final examination, 85 in recitation 

and 80 in laboratory. These grades were weighted 1 , 2, and 3 respectively. 
Find the student’s average grade. 

3 . At the end of his first semester in college a freshman had credits as follows: 

4 hours of mathematics with a grade of 88, 4 hours of English with a grade 
of 80, 3 hours of history with a grade of 85, and 4 hours of physics with a 
grade of 78. What was his average grade per hour of credit? 

4 . Find the median of Table 12. 

6. The population of a city increased in 5 years from 225,000 to 245,000. What 
was the average increase per year? What was the average annual rate of 
increase? * 

6. The number of bacteria in a certain culture was found to be 4 X 10* at noon of 
one day. At noon the next day the number was found to be 9 X 10 s . 
If the number increased at a constant rate per hour, how many bacteria 
were there at midnight? 
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7. Find the average (G.M.) rate of interest for five years during which the in- 

terest rates were 4.25%, 5.3%, 4.65%, 3.86%, 4.38%. 

Hint. (1 4- i) 6 = (1 .0425) (1 .053) (1 .0465) (1 .0386) (1.0438). 

8 . Find the harmonic mean of the first fifteen positive integers. 

9. For two positive numbers, a and 6, the geometric mean is x = Va6. This is 

also called the mean proportional between a and 6, since a:x = x :b. 
By drawing a semicircle on a + b as diameter, show how the value of x can 
be constructed geometrically. 

10. The following table gives the population of the U. S. at each 10-year census 
from 1860 to 1920. 


Year 

X 

Population 
( millions ) 

Ratio of Each Census 
Figure to Preceding 

1860 

31.4 


70 

38.6 

1.23 

80 

50.2 

1.30 

90 

63.0 

1.25 

1900 

76.0 

1.20 

10 

92.0 

1.21 

20 

105.7 

1.15 


What is the average rate of increase per decade? Using this average, 
estimate the population for 1930 from the 1920 census figure. 

11. If a series of positive variates form a geometric progression show that their 

logarithms form an arithmetic progression. 

12. Find the geometric mean of the following: 

(а) 2, 4, 8, 16, 32. 

(б) 47, 92, 123, 218. 

13. Given two sets of n positive variates each: 

2 * 11 , Xn } Xu, X\ n 

X‘21, Xu, X23 f X2n. 

Prove that the geometric mean of the ratios of corresponding variates in 
the two sets is equal to the ratio of their geometric means. 

14. (a) For a frequency distribution of positive variates show that (10) becomes 

G.M. = • x 2 f * ■ • • 

where k is the number of different values of x in the set, any exponent is 

k 

the number of times Xi is repeated, and N = £/*• 

l 

(6) What is the expression for log G.M. when G.M. is defined as in (a)? 

15. A wholesale firm has twelve travelling salesmen who make trips of essentially 

the same length. Of these, eight make their trip in 20 days and four in 15 
days. What is the average time per trip? A ns. 18 days:' 
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16 . State two rules for averaging prices similar to those given for averaging rates. 

Give illustrations. 

17 . Consider any two positive variates x\ and x 2 . Prove that their geometric 

mean is equal to the geometric mean between their arithmetic mean and 
their harmonic mean. 

18 . (Burgess) The following problem arose in a statistical office in Washington 

during World War I: Suppose 20 boats make 6 trans-atlantic trips 
each per year, giving as the time for a “ turn around ” (t.e., time between 
consecutive departures from the same ports), one-sixth year — approxi- 
mately 60 days, and that 10 boats make 4 trips per year, giving as their 
time for a “ turn around ” one-fourth year, approximately 90 days. (A 
year of 360 days is used merely for convenience.) What is the average 
number of days per turn around? 

Hint. If we think of the rates expressed as “trips per year” then 
x = d/t. If t is regarded as the fixed element, then by Rule 2 the arith- 
metic mean is indicated, and -r = 6 for 20 values of x t and x = 4 for 10 
values. 

If we think of the rates expressed as “ days per trip ” then x = t/d. If 
t is the fixed element, by Rule 1 the harmonic mean is the correct average, 
and x — 60 for 20 values and x = 90 for JO values. Ans. 5J trips per 
year or 67.5 days per trip. 

19 . Show that if 2a is the harmonic mean of the t wo rational numbers b and c, 

then the sum of the squares of the three numbers a , />, and c is the square 
of a rational number. 

(Reference: American Mathematical Monthly , June 1935, p. 394.) 

20. (a) If A, G, and II represent, respectively, the arithmetic, geometric, and 

harmonic means of N unequal positive variates, prove that 

II < G < A 

(Reference: Burgess* text, p. 101.) 

(b) What can you say if the N positive variates arc equal? 

21. A plane travels one half of a given distance I) in miles at, a speed of X\ miles 

per hour, and the remaining half distance at a speed of .r 2 miles per hour. 
Show that the average speed for the entire distance is the harmonic mean 
of Xi and x 2 . Half of this average speed is called the “ radius of action 
per hour ” ; i.e., it is the outbound distance that a plane can travel and 
return in one hour. The “radius of action” of a plane would be the 
“ radius of action per hour ” multiplied by the number ot hours in flight. 



CHAPTER IV 
MOMENTS 

1. Moments about an Arbitrary Origin. One of the general prob- 
lems of statistics is to summarize and characterize data. In the 
words of R. A. Fisher, 

A quantity of data which by its mere bulk may be incapable of entering the 
mind is to be replaced by relatively few quantities which shall adequately rep- 
resent the whole, or which, in other w r ords, shall contain as much as possible, 

ideally the whole, of the relevant information contained in the original data . 1 
* 

These “ relatively few quantities ” are usually expressed in terms 
of moments. Moments are of different orders and the student is 
already familiar with what is now to be known as the first moment, 
namely, the arithmetic mean of the first powers of the variates. We 
will also need in our work the arithmetic means, respectively, of the 
second, third, and fourth powers of the variates. With reference to 
an arbitrary origin, moments are denoted by v (the Greek letter nu) 
with a subscript specifying the order. 

The first four moments, relative to the z-origin and in the x unit, 
are defined as follows: 

fi'EfiXi = X 

pj fiXi 2 
ft X ,f iX* 

S 2/<x /’ 



i varying from 1 to k . 

A more general definition of the p's is 

(la) Vr = ^ £3fi(Xi - x 0 ) r 


1 Foundations of Theoretical Statistics t Philosophical Transactions of the Royal 
Society, vol. 222A (1922), p. 309. 
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for the rth moment about an arbitrary point xo. When x<> — 0 and 
r = 1, 2, 3, and 4, we have the definitions stated in (1). If r = 0 
we have the zeroth moment and v a — 1. 

In statistics we work with moments per unit frequency. The 
term “ moment ” has its origin in mechanics where we speak of the 
“ moment of a force.” Suppose we have a rigid bar, called a 
lever, with one point of support known as a fulcrum (Figure 10)". If 
a force fi is applied to the lever 




f, 


t 


at a distance x\ from the fulcrum 
0 , the product Zi/i is called the 
moment of the force. If there 
are two or more such forces f u 
/ 2 , • • •, fk, acting in the same 
direction, and at the distances 
X\ y 2 * 2 , • • *, Xky respectively from 0, the total moment of all these 
forces is 


— V 

*2 

Fki. 10 


S\Xi + 'jiX 2 + • • + fkXk =2/.X<. 


If the distances x are squared, we have ^2fiX* 2 as the total second 
moment, and represents the rth moment. 

It is by analogy with this mechanical concept that the expressions 
in (I) are called statistical moments (per unit frequency) about zero 
as origin. 


Exercises 

1. Write out the expanded form of the v s defined in (1). 

2 . Calculate the values of ?i, v 2t and v 3 for the following distributions: 


(a) (b) 


X 

/ 

X 

/ 

0 

1 

—3 

1 

1 

3 

-2 

3 

2 

5 

-1 

5 

3 

10 

0 

5 

4 

5 

1 

3 

5 

2 

2 

1 


S. (a) Prove that v 0 is always equal to unity. 

(6) Prove that moments of even order are always positive or zero, but that 
moments of odd order may be positive, negative, or zero. 
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(c) Show that the odd moments are all zero if both the x’s and/’s are sym- 
metrical with respect to the origin of z, as, for example, 


X 

— 1.5 

- 1.0 

- 0.5 

D 

1.0 

1.5 

1 

1 

2 

3 

3 

2 

1 









2. Moments in Units of the Class Interval. In Chapter III, 
§8, the mean in the x unit was obtained by first finding the mean in 

the u unit, viz., ^ and then changing over into the x unit by 


multiplying by the interval c. In our subsequent work, which re- 
quires the higher moments, we shall find it convenient to use a similar 
procedure, and find those moments in the u unit, where u = 
(x — Xq )/c. It is desirable, therefore, in labeling the moments for 
any distribution, to specify whether they are in the unit of x or u. 
This is commonly done by the use of a second subscript on v. Thus 
v r: u denotes the rth moment in the u unit and relative to the w-origin. 
Therefore, 


v >, = *!>< = a 

v*. =££/*<■ 

v *« -jj £/<«<* 

v 4:u = £/.Ui 4 - 


Similarly, v r:x will mean — When there is no ambiguity, the 

second subscript on v may be omitted. 

3. Moments about the Mean. Formulas (1) and (2) define the 
moments taken about zero as origin although in different units. 
When the mean is chosen as origin we have the most important set 
of moments in the theory of statistics. In this case the Greek letter 
ft (mu) is used to denote the moments, and it is always understood 
that the use of n specifies the mean as origin. It does not, however, 
designate the unit, so the second subscript may still be necessary. 
Therefore, the rth moment about the mean is defined by either of the 
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following expressions: 


(3) 1 

V-ru =jj'Efi(Ui-u) T . 

The mean is a sort of balance point. If weights proportional to 
the frequencies arc suspended along a horizontal bar at distances from 
one end proportional to the numbers representing the class marks, 
then the bar will balance at the weighted mean of the distances. In 
mechanics this point is known as the abscissa of the center of gravity 
or centroid. Theorem VI of Chapter III, §7, is another way of say- 
ing that the given distribution is in equilibrium about this point. 

4. Relations between the H’s and v’s. We shall see that the de- 
scriptive constants mentioned at the beginning of the chapter are 
defined in terms of the moments about the mean, but the moments 
about an arbitrary point are easier to calculate. In other words, 
what we desire are the values of /x r , but their computation directly 
from the definitions (3) may be very laborious even in the u unit due 
to the fact that ( u — u) usually involves decimals. Raising these 
decimals to the second, third, and fourth powers becomes tedious 
even with the aid of a computing machine. On the other hand, the 
v’s defined in (2) are readily computed. Therefore, instead of com- 
puting the v’s directly we obtain them indirectly from the vs. The 
relations between the /z’s and v’s can be found by expanding, by the 
Binomial Theorem, either of the expressions following the 2*8 in 

(3) for r = 2, 3, 4. This is done in the u unit as follows: 

- «)* 

= I ‘ Jf “ 

= v s - 2uvi + u 

(4) = v 2 — (v,)* f since u = v x 

n» * - «)* 

= v, - 3v 2 • Vi + 2(v,)' 
m = v 4 - 4v, • Vj + 6Vi(Vj)* - 3(Vi) 4 . 


(5) 

( 6 ) 
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These formulas are important and the student should be able to 
derive them. It should be apparent that these moment relations 
hold also in the x unit. However, if we have the ju's in the u unit 
and we desire them in the x unit they may be found as follows: 

V-t-.z = c 2 |i 2;u 

(7) \h:z = C 3 ! h-.u 

H*4:x = C 4 |l 4:tt . 

The first of the relations given in (7) is proved below. The others 
may be proved in a similar manner. 

H3-.z = ~ Z ~ by definition, 

= 1 Z fi(x B + cut — x 0 — cm ) 2 by (4a) and (5), Chapter III, 
c 2 

- y ) 2 = C 2 H 2:u . 

We see that the indirect method of computing the n’s (in the u unit) 
involves two steps. First the vs are computed according to the 
definitions in (2). This step is illustrated in Table 18. Then we 
calculate the y } s by substituting the computed v’s in relations (4), 
(5), and (6). The y’s in the x unit could then be obtained, if desired, 
by means j)f (7). 

Before proceeding with the second step it is desirable to check the 
p’b or, at least, the totals of the columns from which they are ob- 
tained. This can be done if we have another column headed 
f(u + l) 4 , and observe that 

E/(w + l) 4 = 2> 4 + 4 ZjV + 6Z/u° + 4 2> + Zf. 

This is known as Charlier’s check . Ameltemative one is to check the 
entries in the column fu 4 against the proper entries in Pearson's 
Tables for Statisticians and Biometricians , Table L. 

Charlier's check is a necessary but not a sufficient check. That is 
to say, compensating errors may occur which this check would not 
detect. However, the occurrence of such errors is very unlikely. 

Applying Charlier's check to Table 18 we have 

1220 = 1088 + 4(— 236) + 6(176) + 4(-20) + 100 - 1220. 
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Table 18 — Moments for Distribution of Grades 


Data 

Computations 

2 

D 

u 

/« 

/U s 

/«* 

/a* 

/(« + D 4 

34.5 

2 

H 

- 8 

32 

-128 

512 

162 

44.5 

3 

BS 

- 9 

27 

- 81 

243 

48 

54.5 

11 

-2 

-22 

44 

- 88 

176 

11 

64.5 

20 

-1 

-20 

20 

- 20 

20 

0 

74.5 

32 

0 

0 

0 

0 

0 

32 

84.5 

25 

1 

25 

25 

25 

25 

400 

94.5 

7 

2 

14 

28 

56 

112 

567 

Sums 

100 


-20 

170 

-236 

1088 

1220 

i Sums 

1 


-.20 

1.76 

-2.36 

10.88 

For Chari icr's 

N 



*»:* 

V‘>:u 



check 


Hence we may proceed with confidence to compute the n’s. Using 
relations (4), (5), and (0): 

ii t:u = 1.76 - (—.20) 1 2 = 1.72 

M 3:« =- 2.36 - 3(1.70)(— .20) + 2(-.20) 3 = -1.320 

M4:« = 10.88 - 4(— 2.36) (—.20) + 6(1.76)(-.20) 2 - 3(-.20)< 

= 9.4096. 

The following check, which can be handled readily on a machine, 
may be used to check the /*’s: 

va = ^ 2>. 4 = ^ Z/‘U x ' -•’») + * , i]‘ 

= M4 + 4/X 3 ^l + + ^l 4 * 

Before explaining the applications of M 2 , Ms, and M4 we present some 
exercises which will aid the student in mastering the procedure thus 
far developed. 

Exercises 

1. (a) Verify relations (4), (5), and (6). 

(6) Show that these relations hold also in the x unit. 

(c) Prove that mi ■* 0 in any unit. 
id) When / * 1, show that 


Mi:* 
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2 . Verify the relations given in (7). 

3. Using Table 18 as a model find the v y s for Iowa City rainfall by extending 

Table 17. 

4 . Find the n's from your results in Exorcise 3 above. 


5. Standard Deviation. Formula (4), /u 2 = v* — v\ 2 , is perhaps 
the most important of the moment relations for elementary statistics. 
It states that the second moment about the mean is equal to the 
second moment about zero diminished by the square of the mean 
measured from zero. 

Many of the definitions in statistics are essentially those of physics 
and mechanics. The analogy between the mean and centroid has 
been mentioned. 'The above statement about formula (4) is a well- 
known proposition in mechanics when the word centroid is substi- 
tuted for mean. 

In mechanics the equivalent of Nf * 2 is called the moment of inertia 
(about the axis through the centroid) and (^ 2 )} 12 is the radius of gyra- 
tion . These notions are carried over in statistics. Suppose a thin 
metal plate in the shape of a histogram is rotating about a vertical 
axis through its centroid. There is a distance from the centroid at 
which the entire mass of the histogram could be concentrated 
without changing its moment of inertia. This distance is the 
square root of ^ 2 . It is an average rotational radius for all par- 
ticles of the rotating mass. In statistics, 0u 2 ) l/2 is called the stand- 
ard deviation and is denoted by the small Greek letter a. Therefore 
we have 


( 8 ) 



(<r* — C(Tu» 


We shall see later that <r is a measure of what is called dispersion . 
More precisely, it measures the extent to which the data are spread 
out “ on the average ” on either side of the mean. (See Figure 11.) 
The student will obtain a more complete understanding of a as the 
course develops. 

The mean and standard deviation are always expressed finally in 
the same units as the variates. If x represents inches, we desire the 
mean and standard deviation in inches. When obtained they should 
be labelled appropriately. 
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Example. For Table 18, we liave 

x = cu + Xo = 10 ( — . 20 ) + 74.5 = 72 . 5 % 

= ( 1 . 72) 1 ' 3 = 1.31 
4T X = COu = 10 ( 1 . 31 ) = 13 . 1 % 

Thus, we have explained the use of the first and second moments. 

The student will observe that the change from a u to v x does not in- 
volve x 0 . The standard deviation is affected by the change in units 
but is independent of the origin of reference. To prove this let 
x' = x — x 0} whence x' = x — x 0 (why?). Then 

= Ms:.’ « ^ 

= ~ - X 0 — x + Xo ] 2 

= jr 'Lf&i - *y 

= g2:x = 0V. 

This suggests the more general 

Theorem. The value of y r remains invariant under a transforma- 
tion which changes only the origin of reference of the variates . 

The student is asked to prove the equivalent of this theorem in 
Exercise 3 after §9. 

6. Standard Units. The above section explains g 2 . There re- 
mains the explanation of jus and ha. We will lead up to this by 
defining standard units. We have mentioned the transformation 
x f = x — x. Another very useful transformation consists in measur- 
ing such deviations from the mean in units of the standard deviation, 
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<r Xf of the entire distribution. They are then known as standard 
units and will be designated by t. Thus, 


(9) 



o-x or* 


Graphically, this translates the origin to the mean and measures dis- 
tances along the horizontal axis in terms of <j x . It is a special case 
of the more general transformation 

x — x 0 

u = 

c 


The significant characteristic of the t variate is its independence of 
the unit in which the original measurements were taken. For ex- 
ample, suppose we were concerned with obtaining the linear measure- 
ments of a set of individuals. One distribution of variates would 
result if the measurements were made in feet, in this ease x' f x, and 
<r x would also be in feet. If the measurements were taken in inches, 
then x\ x , and a* would be in inches, and each of these values would 
be, numerically, twelve times as large as the corresponding numbers 
in the first distribution. However, the variates expressed in standard 
units would be the same for the two distributions. Thus if 


and 


x = 50 ft. = 50(12) in., 
a x = 5 ft. ~ 5(12) in., 


then for an individual measurement of x = 00 ft. = 60(12) in., we 
have 

_ 10 ft. _ 10(32) in. 

1 ~ 5 ft. ~ 5(12) in. 

< = 2 = 2 . 


It is obvious, therefore, that standard units provide a basis for 
comparing distributions. Moreover, they make possible important 
simplifications in certain mathematical operations. 

With the aid of a computing machine, a distribution may be easily 
transformed into standard units by means of the so-called continuous 
process. To illustrate, suppose for the distribution of Table 9 (§11, 
Chapter I), it has been found that 

x = 47.712 lbs. 

= 5.772 lbs. 
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By relation (9), then, 

x - 47.712 

t = '" " 5 772 = 17325x - 8.2661. 

Referring to the discussion of the continuous method given in the 
Introduction, we observe that here k = -8.2661, n = .17325, and we 
desire the values of t corresponding to the values of x given in Table 9. 
For the values of x such that nx < k , we write the above relation in 
the form 

-t = 8.2661 - .17325a;. 

The procedure 1 now is to register 8.266100 on the product register, 
punch the constant factor .17325 on the keyboard, and then by turn- 
ing the crank backward so that the successive values of x appear on 
the revolution register, we subtract from k the products of this mul- 
tiplier and the values of x. The various values of x are built over 
from one to another without clearing the dial. The resulting values 
of — t are read at each stage from the product register until we get 
— t = 0.383. From here, nx > k , so we clear the dials and start 
over using the original form of the relation between x and t. We now 
register —8.266100 on the product register by turning the crank 
backward, punch .17325 on the keyboard, and turn the crank for- 
ward to form the values of x on the revolution register. The values 
of t are read as before from the product register at each stage of the 
build-over process. In this way the following set of standard vari- 
ates is obtained : 

Table 19 


X 

/ 

t 

29.5 

1 

- 3.155 

33.5 

14 

-2 462 

37.5 

56 

- 1.770 

41.5 

172 

- 1.076 

45.5 

245 

- 0.383 

49.5 

263 

0.310 

53.5 

156 

1.003 

57.5 

67 

1.696 

61.5 . 

23 

2.389 

65.5 

3 

3.082 


1 If automatic machines are available the instructor will explain the pro- 
cedure. 
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We see from Table 19 that a range of t = ±3 takes in practically all 
the variates. This is typical of the more common distributions. 

If x = 0, then t = x/a and the origin of t is the same as the origin of 
x . Some writers use X to denote the variates (i.c., pounds, dollars, 
temperatures, etc.), and use x to denote deviations from the mean. 
In that notation, t = x/o would have the same meaning as our equa- 
tion (9). Occasionally in later chapters we shall find it convenient 
to designate deviations from the mean by x (instead of x'). If so, it 
will be stated that the origin of x is at the mean or centroid. 

7. Moments in Standard Units. The moments in standard units 
are denoted by the Greek letter alpha, a. Thus for the rth moment 


in standard units, we have a r 



However, it is not neces- 


sary to transform the variates into t units in order to compute the a’s. 
We shall show that they are functions of the p s. Thus 


Hence 

( 10 ) 



by definition 


from (9) 


Why? 

.« ^ 

ii 

Why? 

_ Kt:* 

from (8). 


Letting r = 1, 2, 3, 4 in (10) we have 

n _ lii * _ 

Cli = = 

<Tx 

|L2:x 

*2 = 2 = 

C Tx 2 

Jfajx 
(< T *) 3 

- 

(<r*) 4 


(10fl) 


03 = 


0 

1 


It is obvious that ai and at are abstract numbers. This is also the 
case for the other a’s. In the expressions for aa and a 4 both numera- 



Sec. 8 


Use of as and on 


73 


tor and denominator are of the same dimension. That is to say, in 
as = P3/0 3 both numerator and denominator are the cubes of what- 
ever unit is used in the original measurements, and therefore their 
ratio is of zero dimension, a pure number. Similarly, in a\ = m/o A 
both numerator and denominator are the four powers of the same 
unit, and therefore a 4 is an abstract number. 

Some writers use gi instead of a 3 and g 2 for a 4 — 3 . 

8. Use of 03 and 04. Since on and a 2 have the same values for all 
frequency distributions, their computation contributes nothing to 
the description or- characterization of a distribution. But the values 
of a 3 and a 4 depend upon the shape of the histogram representing a 
distribution, and are therefore useful in distinguishing between types 
of distributions. Thus, we observe that 

m = jj. - *) 3 

is a measure of asymmetry about the mean. If the variates are dis- 
tributed symmetrically about x then /m 3 = 0 . But if the positive 
deviations from the mean outweigh the negative deviations then 
M3 > 0 , whereas if the negative deviations predominate, then *13 < 0 . 
Cubing the deviations gives a measure which is sensitive both to their 
size and sign but the result is in cubic units. Now symmetry, or lack 
of it, is not a function of the original units of measurement, so if we 
divide M3 by a z we get a pure number. Thus a 3 is a satisfactory meas- 
ure for comparing symmetry in distributions of different units of 
measurement. 

The quantity a 4 measures a characteristic called “ kurtosis.” It 
refers to the relative number of variates in the vicinity of the mean. 
More will be said about a 3 and a 4 later on. At this time, emphasis 
should be placed upon their calculation rather than upon the infor- 
mation which they yield. 

Inasmuch as the as are independent of the unit of measurement, 
they may be computed from the moments in the u unit. Changing 
these moments into the x unit would only introduce the same factor 
into the numerator and denominator, which would of course divide 
out. Thus: 

— ililf — c3 ^ 3:m _ 

019 Cx <*u 

M 4 :x 4 :* M 4 :u 

O'* 4 fVu 4 <T U 4 



74 


Moments 


IV 


For Table 18 we have 


az 


<*4 = 


-1.320 

(1.72) (1.31) 
9.4096 

(1.72) 2 3 ‘ 8 ‘ 


= - 0 . 686 , 


Although no limits can be placed on the possible values which a 3 
and 04 may take, it may be said that for the more common distri- 
butions 04 fluctuates around 3 and o 3 is usually not more than 2 nor 
less than —2. We cannot go into the theoretical reasons for these 
values and we mention them here merely to guide the student as to 
what is a reasonable result to expect in the exercises in this book. 
In this connection, the inequality 1 

«4 ^ « 3 2 + 1 

may also prove useful. When the numerical value of o 3 is large, the 
distribution may be of the J-shaped type which is an extreme form 
of the asymmetrical type. However, these types cannot always be 
distinguished by elementary methods if the original data are not 
available. 

9. Summary. The quantities x, a X) a? n and a 4 are called the de- 
scriptive constants of the distribution. They (together with N) are 
the “relatively few quantities” (§1) which, in certain cases, con- 
tain all the relevant information in the distribution. Table 20 will 
serve as a model for the procedure which the student should follow 
in computing these quantities. Of course, if the work is done on a 
computing machine, only the totals of the power sums need be re- 
corded. The detail of the columns may be omitted. In Table 20, 
c = 1, so a x = <r M . Obviously, this would not be true in general. 

The calculation of the v’s proceeds naturally as an extension of the 
work required to compute x for a frequency distribution. Thus to 
obtain 5 we first compute v VtU and then obtain x from the relation 

x = cu + Xq? 


To obtain the standard deviation we need the value of V 2 because <r, 
is found from the relations 


jU2 = V2 — U 2 



1 “ A Note on Skewness and Kurtosis ” — J. Ernest Wilkins, Jr. Annals of 

Mathematical Statistics , vol. 15 (1944), pp. 333-335. 
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The next chapter is devoted to a discussion of dispersion of which <r m 
is a measure. To be sure, the standard deviation is only one of several 
measures of dispersion, just as the mean is only one of several aver- 
ages. But both the mean and the standard deviation play important 
roles in the theory and practice of statistics. It is important to 
master the pattern by which they are computed in a frequency 
distribution. 

In order to compute a 3 and <24 we first require r 3 and v A (in addition 
to v\ and vz). Then M3 and are obtained from (5) and (6). Finally, 


gr:u 

^ (*.)' 

is computed for r = 3 and r = 4. The characteristics of a distri- 
bution which a 3 and a\ describe will be discussed in Chapter VI and 
again in Part II. In elementary work they are less important than 
x and a x . 

With regard to the number of decimal places to be retained in 
computations, the author agrees with Dr. Shewhart who says: “It 
does not appear feasible ... to lay down simple, practical, and in- 
fallible rules.” Reasons in support of this opinion are stated in his 
book, 1 pp. 79-80. For other remarks in this connection, the reader 
is referred to the books by Walker and Scarborough which are cited 
in our Introduction. 


Exercises 

1. (a) What is the numerical value of the mem of any distribution of variates 

expressed in t units? 

(6) What is the standard deviation of such a distribution? Hint: at = 

2. (o) Show that (x — 55) = c(u — u) and hence that t = (u — fi )/*u. 

(b) Show that we obtain the same results for the a’ts if we take 

u — u 
t 

*r u 

3. Prove: If any constant is added algebraically to each variate of a series the 
• values of n r for the new series will be identical with the corresponding values 

of Mr of the original series. 

4. Suppose each variate is multiplied by a constant. What effect would this 

have on 2, a Xf as, and a«? 

1 See footnote, p. 52. 
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6. Show that the standard deviation of x may be written 

- 2) 2 i 

-K^-r- 

6. Prove the general relation 

Mr’z — Mr ’u 

of which the relations given in (7) are special ca*es when r = 2, 3, 4. 
Hint: (x — x) = c(u — u). 

7. (a) Show that. a 0 = 1. 

{b) Show that <r r = (fi 2 ) r/2 in both the a; and « units. 

8. Prove from (4) that /x 2 is less than or at most equal to v 2% the same unit being 

used in each case. 

9. Find 5, <r x , <*s, and a 4 for Iowa City rainfall using your results from Prob- 

lem 4 of the preceding sot of Exercises. 

A ns. 

x = 2.80 in. a 3 = 1.29, 

a f = 2.01 in. ot 4 = 4.58. 

10. Using Table 20 as a model find 2, <r z , a 3| and a 4 for the distributions in §11, 
Chapter I, according to the direction of the instructor. 
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Table 20 — Specimen Worksheets for Computing the Characterising 
Constants of a Distribution 




Subject: 

Span among Adult Males (Table 13) 


X 

/ 

u 

uS 

»v 

u*f 

«v 

(a + I)*/ 

58.5 

1 

-11 

- 11 

121 

-1,331 

14,641 

10,000 

59.5 

2 

-10 

- 20 

200 

-2,000 

20,000 

13,122 

60.5 

1 

- 9 

- 9 

81 

- 729 

6,561 

4,096 

61.5 

6 

- 8 

- 48 

384 

-3,072 

24,576 

14,406 

62.5 

7 

- 7 

- 49 

343 

-2,401 

16,807 

9,072 

63.5 

22 

- 6 

-132 

792 

-4,752 

28,512 

13,750 

64 5 

55 

- 5 

-275 

1,375 

-6,875 

34,375 

14,080 

65.5 

111 

- 4 

-444 

1,776 

-7,104 

28,416 

8,991 

66.5 

146 

- 3 

-438 

1,341 

-3,942 

11,826 

2,336 

67.5 

182 

- 2 

-364 

728 

-1,456 

2,912 

182 

68.5 

229 

- 1 

-229 

229 

- 229 

229 

0 

69.5 

265 

0 

0 

0 

0 

0 

265 

70.5 

263 

1 

263 

263 

263 

263 

4,208 

71.5 

217 

2 

434 

868 

1,736 

3,472 

17,577 

72.5 

176 

3 

528 

1,584 

4,752 

14,256 

45,056 

73.5 

132 

4 

528 

2,112 

8,448 

33,792 

82,500 

74.5 

82 

5 

410 

2,050 

10,250 

51,250 

106,272 

75.5 

48 

6 

288 

1,728 

10,368 

62,208 

115,248 

76.5 

20 

7 

140 

980 

6,860 

48,020 

81,920 

77.5 

16 

8 

128 

1,024 

8,192 

65,536 

104,976 

78.5 

12 

9 

108 

972 

8,748 

78,732 

120,000 

79.5 

3 

10 

30 

300 

3,000 

30,000 

43,923 

80.5 

1 

11 

11 

121 

1,331 

14,641 

20,736 

81.5 

2 

12 

24 

288 

3,456 

41,472 

57,122 

82.5 

1 

13 

13 

169 

2,197 

28,561 

38,416 

Sums 

2,000 


886 

19,802 

35,710 

061 ,058 

928,254 

(Sums)/i\T 


.443 

9.901 

17 855 

330.529 





u 

vt 

Vl 

Vi 



Charlier’s check: 

£(« + 1)V = £u 4 / + + 6 23“*/ + 4£u/ f £/ 

928,254 - 661,058 + 4(35,710) + 6(19,802) + 4(886) + 2,000 = 928,254 
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Computations: 

2 - cU + xo - (1) C-443) + 69.5 * 69.948 in. 
C 2 = .196249 

u* = .086938, 5 4 = .038514 

111 — v 2 — W 2 

= 9.901 - .196249 = 9.704751 
= V9.704751 = 3.115 
o x = *r u = (1) (3.1 15) = 3.115 in. 
in — v s — Zvtfi + 2 u* 

= 17.855 - 3(9.901) (.443) + 2(.086938) 

= 17.855 - 13.158429 + .173870 
= 4.870447 


in = Vi — 4i fiU + 6vs u 2 — 3 u A 

= (330.529) - 4(17.855) (.443) + 6(9.901) (.96249) - 3 (.028514) 
= 330.529 - 31.639060 + 11.658368 - .115542 
= 310.432769 

<ru 3 = (3.115) (9.704751) = 30.230299 
a« 4 = (9.704751 ) 2 — 94.182192 


Summary: 


at = 


«4 = 


4.870447 

30.203299 

310.432766 

94.082192 


= .161 


= 3.296 


2 = 69.943 in.; as = 0.161; 

o x = 3.115 in.; = 3.296. 


10. Sheppard's Corrections. The moments of a frequency dis- 
tribution are computed on the assumption that each variate value in 
a class interval has the value of the class mark for that interval. This 
has the effect of replacing the actual data by somewhat fictitious data 
assigned arbitrarily at the central values of the intervals. Evidently 
a very coarse grouping might be misleading and it can be shown math- 
ematically that the above assumption introduces a systematic error, 
called a grouping error, in the results obtained for the second and 
fourth moments about the mean but does not affect pi and /z 3 . To 
eliminate this systematic tendency certain corrections are applied to 
M 2 and fa. 

The derivation of these corrections is beyond the scope of an ele- 
mentary course, but it may be worth while to see why it is that cor- 
rections are necessary for some moments and not for others. The 
following argument is intended only as a pedagogical device to give 
a plausible explanation. Suppose a smooth curve represents the 
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true frequency distribution while the histogram represents the dis- 
tribution with class marks as the variates. Since the moments are 
computed from the distribution represented by the histogram, we 
scarcely expect our results to be exactly the values of the moments 
of the true distribution, which are, of course, what we seek. In using 
the distribution represented by the histogram, we are neglecting, for 
each rectangle, the little area under the curve shaded A and sub- 
stituting for it the little area shaded B. Suppose that, in general, 
B is a little larger than A , as shown in Figure 12. The excess of B 
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over A for those rectangles to the left of x will be negative; the cor- 
responding excess for those rectangles to the right of x will be posi- 
tive. This may be readily understood by considering these little areas 
as approximate triangles whose bases are negative or positive accord- 
ing as they are to the left or right of x. These excesses for all the rec- 
tangles, both positive and negative, are involved in taking the sum- 
mation £/,(*, — x) <r for the moments. When r is an odd number, 
as 1 or 3, the excesses show up with their algebraic signs and there- 
fore, over the range of the distribution, the positive excesses just 
about offset the negatives ones. But in the case of the even moments, 
all the excesses now become positive so that the errors accumulate 
and the final results for these moments are too large. 

To reduce these errors due to grouping, W. F. Sheppard has demon- 
strated 1 that the following corrections should be applied. It should 

1 Students familiar with more advanced mathematics will find an interesting 
discussion of systematic errors and references to papers dealing with Sheppard’s 
corrections in an article by H. C. Carver, Annals of Mathematical Statistics , 
vol. 7, p. 154. 
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be noticed that as we state them here they should be applied only 
where the class interval is unity, i.e., in the u unit. 

Corrected g2:u = uncorrected 

12 

Corrected /jlz :u = uncorrected /i 3 :u 

1 7 

Corrected /i 4 :« = uncorrected p A ;u — - (uncorrected /u 2 :u) H 


(li- 0 08333 '^- 002917 )' 

Example . For Table 18 we have 

Corrected = 1.720 — 0.083 = 1.637 
<r„ = \/l.637 = 1.28 
Corrected = 10(1.28) = 12.8% 
Corrected M 4 .-u - 9.4096 - (1.72)/2 + 7/240 
- 8.5788 

a 4 = 8.5788/ (1.637)* = 3.20 


The values of 2 and M 3 remain unchanged. 


Sheppard’s corrections are valid only for the bell-shaped types of 
distributions. They are not applicable to the J-shaped or U-shaped 
types. Moreover, they constitute a refinement which may not al- 
ways be consistent with the degree of accuracy in the original data. 
The errors of grouping (not mistakes) are usually small compared 
with the errors existing in the raw data. So, it seems that little 
would be gained by their use in a first course. We will occasionally 
use them in an illustration. 



CHAPTER V 

MEASURES OF DISPERSION 

1. Introduction. The concept of variability is fundamental today 
not only in the social sciences but also in the so-called exact physical 
sciences. Modern scientific method recognizes the existence of 
physical, moral, and mental inequalities. The principle of variabil- 
ity has come to be accepted as the natural order in social, economic, 
and physical phenomena. This principle is the very essence of the 
statistical nature of mass phenomena. In this connection, R. A. 
Fisher says: 1 

The conception of statistics as the study of vnriation is the natural outcome of 
viewing the subject as the study of populations; for a population of individuals 
in all respects identical is completely described by a description of any one indi- 
vidual, together with the number in the group. The populations which are the 
object of statistical study always display variation in one or more respects. 
To speak of statistics as the study of variation also serves to emphasize the 
contrast between the aims of modem statisticians and those of their predecessors. 
For, until comparatively recent times, the vast majority of workers in this field 
appear to have had no other aim than to ascertain aggregate, or average, values. 
The variation itself was not an object of study, but was recognized rather as a 
troublesome circumstance which detracted from the value of the average. . . . Yet, 
from the modern point of view, the study of the causes of variation of any vari- 
able phenomena, from the yield of wheat to the intellect of man, should be 
begun by the examination of the variation which presents itself. The study 
of variation leads immediately to the concept of a frequency distribution. 

It is clearly important, therefore, in studying a distribution, to 
describe how the variates are clustered or scattered around an aver- 
age. Figure 13 shows how two distributions may even have the same 
mean and total frequency, yet differ considerably in variation from 
the mean. Such variation is commonly called dispersion, varia- 
bility, or spread. 

We will consider three measures of dispersion: Quartile Deviation, 

1 R. A. Fisher, Statistical Methods for Research Workers , p. 3. Oliver and Boyd, 
London. 
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Mean Deviation, and Standard Deviation, of which the last is by far 
the most important. 



Fig. 13. Two Distributions Differing in Dispersion 

2. The Quartile Deviation. Just as the median selects one point 
of division, we may now take two additional points such that they, 
together with the median, divide the whole distribution into four 
equal parts. These points are called the quartile values. 

The first quartile, denoted by Q h is that value of x for which 
cum} = N/ 4. That is, one-fourth of all the variates in the distribu- 
tion are smaller in value than Qi and three-fourths of them are larger 
than Qi. The second quartile Q 2 is that value of x for which cumf 
is N/2 and is therefore the median. The third quartile, denoted 
by Q s , is that value of x for which cum f = 3iV/4. Hence fifty per 
cent of the total frequency is included between Qi and Q s . 

Half of the distance between Q* and Qi is called the semi-inter - 
quartile range or quartile de- 
viation and will be denoted 
by Q . Thus, 

( 1 ) Q = Q * ~ Ql 

It should be noted that 
the median does not neces- 
sarily come at the mid-point 
of 2 Q, i.e.j that a distance 
Q laid off on either side of 
Qa would not necessarily reach to Qi and Qs. (See Figure 14.) (For 
a symmetrical distribution, to be considered later, this would 
be true.) 

As a measure of dispersion, Q gives a fairly good idea of the spread 
of the variates, and is suitable as such a measure in those cases where 
the median would be used as an average. The quartile values Qj 



Fig. 14 




The Quartile Deviation 
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and Q» are found, like the median, by interpolation in the cumulative 
frequency table. 


Example, (a) Find the median and the quartile deviation for the distribution 
of IQ’s in Table 6 (§10, Chapter I). (6) Illustrate the measures found in (a) by 
means of a cunt f graph. 


End - 1 

Cum f 

54 5 

0 

04.5 

:s 

74.5 

24 

84.5 

102 



94.5 

284 

<r Med. 


104.5 

589 



114.5 

798 

124.5 

879 

1114.5 

900 

144 5 

N « 905 


Solution: 


A/4 = 220.25, .V/2 = 452.5, 
Q, - Xt 5 220.25 - 102 

10 " 281 - 102 


Q, - 94.5 452.5 - 2S4 

10 " 5X9 - 28 1 ’ 


3JV/4 = 078.75 
Qi = 91.3 


<?T = 100.02 


Q, - 104.3 

io 


078.75 - 580 


v = 


708 - 589 
Qi-Qi 


Q* 


= 8.75 . 


108.8 


Figure 15 explains graphically the measures obtained by inter- 
polation from a cumf table. For convenience in drawing the figure, 
the quartile labels are put on vertical lines. But one should remem- 
ber that the quart ilea are values of r and that it is the horizontal 
distances of the lines from the ?/-axis that represent these measures. 
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Exercises 


* 1 . Criticize the following “ definitions 


Qi 




2 . Find Qi and Qt from the cumulative frequency table which you made to 

obtain the median for the Glasgow schoolgirl distribution. (Exercise 5 
on page 52.) 

3. Find the quartile deviation Q from your results in Exercise 2. 

4 . Find Qi, Qi, Qi for the distribution in Table 12, and compute Q. 

5. Compute the value of the semi-interquartile range for other distributions 

at the direction of the instructor. 



6. The mth percentile P m of a frequency distribution is that value of the vari- 
able x for which cum / = mN /100, where m = 1, 2, • • •, 99. The 10th, 
20th, 30th, • • •, percentiles are called deciles. Therefore, the nth decile 
D» is that value of x for which cum / = nN / 10, where n = 1, 2, • • •, 9. 
Compute several percentiles and deciles of a distribution in the text. 

3. Mean Deviation. As a measure of variation about a central 
value, it would seem appropriate to take an average of all the devia- 
tions about that central value. In tfie mean deviation (MD) about 
the mean this is precisely what we do, namely, we find the arithmetic 
mean of the numerical values of the deviations about the mean. 
In summing the deviations, their absolute values are used because 
regardless of whether deviations are positive or negative they have 
the same influence on the amount of variation. Moreover, if their 
algebraic signs are taken account of, the sum of such deviations is 


See* 3 
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zero (Theorem VI of Chapter III). Hence we sum them treating 
all deviations as positive. 

In mathematical symbols, vertical bars denote absolute values, 
so we have* 

(2) MD = -£/< | Xi-xj, 

if the x unit is used. When the class interval is the unit, we have 

(3) = 

and 

(4) MD Or unit) = c X MD (n unit). 

It can be proved that the essentially positive function 

= A ) 1 2 

is a minimum when A = x. (See Theorem II, page 99. Also by 
the calculus dy/dA = 0 when A = x.) It was in a similar investi- 
gation to find the value of B for which the function 

y = ^Hfi \xi-B\ 

is a minimum, that the median was discovered. When B is the me- 
dian this function is a minimum. 2 This property of the median has 
some statistical importance in connection with the geographical 
location of centers of industry and population. 3 Custom has estab- 
lished the use of the mean rather than the median in this measure. 
Hence “ mean deviation ” usually refers to the mean deviation from 
the mean. It is also called “ average deviation.” 

1 Since all the data arc not concentrated at the midpoints of the intervals, a 
grouping error is involved here as in the formula for o (§10, Chapter IV). But 
the ipean deviation is used so infrequently that discussion here of the appropriate 
correction hardly seems warranted. Those who may be interested will find a 
more precise formula in the Handbook, of M athematical Statistics — Rietz and 
others. 

* For a proof see reference 16, our Introduction. 

* See p. 85 of Elements of Statistics — Davis and Nelson. Principia Press. 
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Example . Find the mean deviation for the grades in Table 18 where the 
mean value of x is 72.5. 


X 

/ 

|i - s| 


34.5 

2 

38 

76 

44.5 

3 

28 

84 

54.5 

11 

18 

198 

64.5 

20 

8 

160 

74.5 

32 

2 

64 

84.5 

25 

12 

300 

94.5 

7 

22 

154 

Total 

100 


1036 


MD = 


1036 

100 


10.36. 


What was <r for this distribution? 

The absolute value of a variable x', denoted by the symbol \x'\, is 
not very tractable in mathematical operations. Therefore the mean 
deviation is not favored by mathematicians since it is unwieldy in 
the more theoretical and mathematical discussions. Its chief use 
is in experimental work where occasional large and erratic deviations 
are likely to occur. In such cases the standard deviation would tend 
to emphasize these deviations. 

If m of the N variates are greater than the mean, x, then the mean 
deviation may be written 


MD = ~ \ (sum of variates greater than x) — mx 




The student is given a hint in Exercise 34 at the end of Part I on 
how to prove a similar formula for x< < x. 

4. The Standard Deviation. To overcome the difficulty of nega- 
tive deviations and the usopof absolute value signs, the deviations 
about the mean may be squared and the mean of these squares taken. 
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To get back into the original linear units, we take the positive square 
root of this result, and have 

[ 1 "ii/a 

- *) J J 

as defined before. The standard deviation measures the same kind 
of phenomenon as the mean deviation and this approach to it is 
frequently satisfactory to a student who otherwise finds it difficult 
to understand. 1 

For a common type of distribution, the standard deviation is 
approximately twenty-five per cent greater than the mean deviation. 
Speaking more accurately, this is true of a normal distribution (to be 
considered in Chapter VI) for which the relation is MD = 
(approximately ) . 

It is often convenient to have a name for “ the square of the 
standard deviation,” and for this purpose the term 41 variance ” has 
been introduced. Thus a denotes standard deviation and o 2 de- 
notes variance. 

Although definition (5) is the basic concept which the student 
should have for the standard deviation, nevertheless in actual prac- 
tice it is seldom desirable to compute a directly from that definition. 
For a frequency distribution the method is shown in the chapter on 
moments. However, we will give an additional illustration here. 

Example. Find the mean and the standard deviation of Table 9, using 
Charlier’s check and Sheppard’s correction. 

Solution: (See Table 21, p. 88.) 

Charlier’s check: + l) 2 *= £/m 2 + 2 ^jfu -f N 

2471 = 2365 + 2 (-447) + 1000 = 2471 

Computations: 

5 « 49.5 + 4 (-.447) = 47.712 lbs. 

P2.u = (&) = 2.165 

1 The term 44 standard deviation ” was proposed by Pearson and is now used by 
almost all English writers. As originally defined by Pearson, this is the square 

root of the mean of the squares of deviations taken from the mean of the distri- 
bution, and is not to be used when deviations are measured from any other 
reference point. Pearson uses the term 44 root-mean-square ” for a similar 
measure when the deviations are taken around any origin other than the mean. — 
Walker, History of Statistical Mcthod } p. 54. 
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Using Sheppard’s corrections, 

Corrected m = 2.165 — .083 = 2.082 
nr.z = 16(2.082) = 33.312 
<r, = V33.312 = 5.772 lbs. 

Table 21 — Weights op Glasgow School Children 


Weight (x) 

i 

u 


fu* 

/(« + l)* 

29.5 lbs. 

1 

-5 

- 5 

25 

16 

33.5 

14 

-4 

- 56 

224 

126 

37.5 

56 

-3 

-168 

504 

224 

4L5 

172 

-2 

-344 

688 

172 

45.5 

245 

-1 

-245 

245 

0 

49.5 

263 

0 

0 

0 

263 

53.5 

156 

1 

156 

156 

624 

57.5 

67 

2 

134 

268 

603 

61.5 

23 

3 

69 

207 

368 

65.5 

3 

4 

12 

48 

75 

Sums 

1000 


-447 

2365 

2471 

(Sums ) /N 

1 


-.447 

u 

2.365 

v% 



It will be proved later that for a certain ideal type of distribution 
which is often approximated in practical statistics the range x ± a x 
includes about two thirds of the variates. Assuming the above 
distribution is of this type we could say that about two thirds of the 
children weighed between 42 pounds and 53.5 pounds. Such a state- 
ment assists one in comprehending certain characteristics of the data 
though the distribution actually may not be before him. 

It is understood that the method of computation described above 
is to be used when the class marks are equispaced. If the class 
intervals are unequal we must choose c = 1 unless the x’s denoting 
the class marks have a common factor c. When c = 1, u becomes 
u = x — Xo, and the work may be simplified a little by an appropriate 
choice of x 0 . 
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Exercises 

1. (Pearson). The following data represent the percentage of ash-content in 


280 wagon tests of a certain kind of coal. 

Find the mean and the standard 

deviation of the distribution: 

Percentage 

Ash-Content 

Frequency 

3.0- 3 9 

1 

4.0- 4.9 

7 

5.0- 5.9 

28 

0.0- 6.9 

78 

7.0- 7.9 

84 

8.0- 8.9 

45 

9.0- 9.9 

28 

10.0-10.9 

7 

11.0-11.9 

2 

A ns. x = 7.35%, o x = 1.36%. 

2. (Camp). Find the mean wage and the standard deviation of the following 

data : 

Class 

F requency 

$4.50- 5.99 

43 

6.00- 7.49 

99 

7.50- 8.99 

152 

9.00-10.49 

178 

10.50-11.99 

160 

12.00-13.49 

40 

13.50-14.99 

25 

15.00-16.49 

3 

Ans. N = 700, 2 = $9.42, = $2.19. 


3. Given <r r = 2.19 for the following (x, f) distribution, find o v and <r„ for the 
(v, f) and (u, /) distributions, respectively. 


/ 

43 

99 


178 

160 

40 

25 

3 

X 

0 


1 

4.5 

6.0 

m 

9.0 

10.5 

V 

0 

■ 

2 

3 

4 

m 

6 

7 

u 

-3 

2 

-1 

0 

1 

2 

3 

4 


What relation' and theorem in Chapter IV does this illustrate? 

4 . Find the variance a x l of Table 16 (§8, Chapter III). 

5. Compute the value of the ratio MD/<r for the data in Exercise 1 above. 

6. Find the mean and standard deviation for the data in Table 10. 
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7. Find the mean and standard deviation for the data in Table 11. 

8 . Transform the variates of the following distribution into standard units: 



6. Relative Dispersions. The full [significance of different values 
of <r can be obtained only by experience, but it is obvious that a small 
standard deviation indicates that the variates are closely clustered 
about the mean; whereas a large standard deviation indicates that 
these values are spread out widely from the mean. (See Figure 13.) 

The size of variates usually influences not only the mean but also 
deviations from the mean. In other words, the magnitudes of the 
deviations from the mean seem to be dependent, in some degree, upon 
the magnitude of the mean. In comparing dispersion in distribu- 
tions, we may correct for differences in the average magnitudes of 
positive variates by taking the ratio of the standard deviation to the 
mean. Thus, the quantity 



is known as the coefficient of variation . It is obviously an abstract 
number, being independent of the units of measurement, and it is 
usually expressed as a percentage. 

The use of (6) may be misleading ’in situations where 1 the origin 
from which the data are measured is somewhat arbitrary. Cases 
in point are temperature measurements and certain psychological 
data. Further discussion of such limitations of (0) will be found in 
references 2, 14, and 15, listed in the Introduction. 

6. Scaling a Distribution in Terms of <r. Suppose we lay off 
intervals of length <r on either side of the mean (Figure 16). Then 
for a certain type of distribution known as the normal curve (which 
will be considered in the next chapter) the following properties can 
be proved: 

(1) The percentage of the total frequency lying outside the range 
H dz is 32% approximately. 

(2) The percentage outside 2 ± 2<r is 5% approximately. 
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Fig. 16 

(3) The range x ±3<r includes practically the whole distribution, 
i.e.f the total range is 6 <r approximately. 

The student will recognize that these ranges are, in standard units, 
t = ±1, t = ±2, t - rh3, respectively. These results follow from 
the relation 


t = ■ » x = x + ta. 

Sometimes it is important in a statistical analysis to know how 
nearly the given variates are distributed in accordance with the 



Fig. 17 — Distribution of Table 21 Scaled Off in Unitb of a 


above property of the normal curve. The distribution of Table 21 
has been scaled off in this manner, with the results shown in Table 
22. Figure 17 will be helpful in verifying them. 

We will verify here the 34.8% given in Table 22, and the student 
is asked to verify the others in Exercise 2. The range x =fc a (Figure 
17) evidently includes all the variates represented by the two central 
rectangles and proportionate parts of the two adjoining rectangles. 
From 39.50 to 41.94 is 2.44, and since the variates are assumed to be 
uniformly distributed over the class interval we have 172(2.44/4) = 
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104.92 for the proportionate number to be excluded in the class 
39.5-43.5. Hence the number below x — a is (1 + 14 + 56 + 
104.92) = 175.92. Similarly, from 53.484 to 55.5 is 2.016, and we 
have 156(2.016/4) = 78.624 as the proportionate number excluded 
in the class 51.5-55.5. Hence the total above x + <r is (78.624 + 
67 + 23 + 3) = 171.624. So the total number outside x ± a is 
(171.624 + 175.92) = 348 or 34.8% of the 1000 variates. This re- 


Table 22 — Results of Scaling Off Table 21 


2 = 47.712 
a x = 5.772 

Range 

Frequency outside the 
given range 

Number 

Percent 

2 — <r = 41.940 2 + a = 53.484 

x dz a 

348 

34.8 

2 - 2a = 36.198 2 + 2<r = 59.256 

2 =t 2a 

60 

6.0 

2 - 3<r = 30.396 2 + 3<r = 65.028 

X i 3a 

3 

0.3 


suit could also be obtained as follows: By forming a cum f table and 
interpolating in the end x column we find 


cum f at, x = 53.484: 828 

cumf at x = 41.940: 176 

Number in the ( x ± <r z ) interval: 652 

Number outside this interval: 348 


7. Semi-interquartile Range in Terms of o% The range (Q 3 — QO/2 
when expressed in units of a has a significance in a normal distribu- 
tion, as will be shown later. We will denote this by s; hence 


G* - Qi J G 

— » and s = - • 

2<r <r 


For the present we merely calculate its value in the exercises below. 


Exercises 

1. Find the mean and standard deviation for the distribution of Lengths of 

Telephone Calls, given in Table 8 (Chapter I). Use Charlier's check. 

2. In the three distributions named, show that the percentages outside 2 + t* for 

t * ±1, ±2, and ±3, are as stated in Table 23. Verify also the values of 8. 
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Table 23 


Distribution 

N 

Percent Outside , 

s 

2 ±cr 

2 ± 2 <r 

5 i 3<r 

Glasgow girls 

1000 

34.8 


0.3 

0.675 

Telephone calls 

995 

32.7 

5.0 

0.4 

0.69 

Span 

2000 

31.8 

4.2 

0.5 

0.665 


8. N Small. Ungrouped Data. When N is small it is seldom de- 
sirable to attempt an arrangement of the variates into a frequency 
distribution. Moreover, in this case, the values of a 3 and a 4 are not 
usually, needed because the applications of these measures relate to 
characteristics of large distributions. Therefore, only the mean and 
standard deviation are usually required for a small set of ungrouped 

Table 24 — Average Yields of Corn in Bushels her Acre 
for a Certain Section in Illinois from 1901-1020 


Year 

Yield (x) 

u 

M 2 

1901 

21 

-15 

225 

1902 

39 

3 

9 

1903 

32 

- 4 

16 

1904 

37 

1 

1 

1905 

40 

4 

16 

1906 

36 

0 

0 

1907 

36 

0 

0 

1908 

32 

- 4 

16 

1909 

36 

0 

0 

1910 

39 

3 

9 

1911 

33 

- 3 

9 

1912 

40 

4 

16 

1913 

27 

- 9 

81 

1914 

29 

- 7 

49 

1915 

36 

0 

0 

1916 

30 

- 6 

36 

1917 

38 

2 

4 

1918 

36 

0 

0 

1919 

36 

0 

0 

1920 

35 

- 1 

1 

Totals 

§ 

II 

-32 

488 
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data. The following methods will help the student become familiar 
with the several formulas for a, which may be used in this case. 

Method I. The indirect method involving the u unit rtlay still be 
used for finding the first and second moments. Since each variate 
is being treated separately / = 1, and we compute the values of 


v, = — y\u T for r = 1 and 2. 
N 


If the values of x are unequally spaced 


we take c = 1 and let u = x — xo which changes the origin but not 
the units. In other words, the procedure is the same as for a fre- 
quency distribution except that / = 1 and c — 1. 


Example. Find the mean and standard deviation for Table 24. N = 20. 
We choose x 0 = 36. 


Table 25 


X 

x f = x - 35 

T /2 

21 

- 13.4 

179.56 

27 

- 7.4 

54.76 

29 

- 5.4 

29.16 

30 

- 4.4 

19.36 

32 

- 2.4 

5.76 

32 

- 2.4 

5.76 

33 

- 1.4 

1.96 

33 

0.6 

.36 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

36 

1.6 

2.56 

37 

2.6 

6.76 

38 

3.6 

12.96 

39 

4.6 , 

21.16 

39 

4.6 

21.16 

40 

5.6 

31.36 

40 

5.6 

31.36 

688 

Elx'l = 73.6 

436.80 
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Computations: 

v\ = 6 


32 

20 


— 1 . 6 ; 2 = Xo 4 " 6 
34.4 bushels 


36 - 1.6 


Therefore, 


v% 


488 

20 


24.40; = * * - 8* - 21.84. 


a* — <T U -* V21.84 = 4.67 bushels. 


Method II. When / = 1, formula (5) becomes 

(7) ** = [^£(*i-3D s ] l/2 « 


and sometimes it is best to compute the standard deviation directly 
from this definition, without the use of the n unit. Thus the origin 
is placed at the mean and all indirect methods are abandoned. If 
the mean deviation ir also desired, clearly this method should be 
used. It is exemplified in Table 25 for the preceding example, and 
the variates have been arranged in order of magnitude. 


x = = 34.4 bushels 

20 


a x * = 


4 36.80 

20 


= 21.84 


o x ~ 4.67 bushels 

i 7Q p, 

MD = — 52 1 x ' I = = 3 68 busheIs * 

N 20 


Method III . From the relation 


we have 


M2 = V2 — M* 

M* = j^E* 2 - 2* 


when / =1. Therefore « may be written 

(8) ** = E* 2 “ J2 ] * 
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Table 26 
x x 2 


21 

441 

27 

729 

29 

841 

30 

900 

32 

1024 

32 

1024 

33 

1089 

35 

1225 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

36 

1296 

37 

1369 

38 

1444 

39 

1521 

39 

1521 

40 

1600 

40 

1600 

688 

24,104 


This method is perhaps the best when the values of x are not large or 
when a table of squares is available. It is illustrated below for the 
preceding example. (See Table 26.) 

Computations: 

1 ~ 688 . . . . . t 
x = - 2l x = — = 34.4 bushels 


2 2 = (34.4) 2 = 1183.36 

.. = 11205.20 - 11^3.361*'* 
= (21 .84) 1 '* 

= 4.67 bushels. 


Miscellaneous Exercises 

1. (a) Verify that the algebraic sum of the numbers in the x' column of Table 25 
is zero. 

(b) Verify the value of mean deviation given for Table 25. 
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2 . Using your own judgment as to the most appropriate method, find the mean 
an^standard deviation for each of the two sets of data, x\ and x 2 : 



Answers 

Zi 

88 

95 

68 

73 

75 

88 

57 

68 

62 

79 

73 

74 

78 

2, -‘69.80 

80 

57 

65 

69 

74 

78 

72 

59 

47 

56 

67 

43 


<ri - 12.13 

Xt 

82 

86 

75 

78 

72 

79 

63 

65 

67 

75 

68 

70 

79 

2, - 67.64 

78 

51 

58 

65 

69 

68 

83 

80 

42 

43 

48 

47 


<r. - 12.68 


3. Complete the computations and find the mean and variance of the following 
distribution: 



Hint. Here we lot v — y — //„. Then y — U -f y 0 , and a y 2 ~ <r v 2 since c - 1 . 

(See Theorem on p. 69. ) 

Ans. y — 87.31, <r v 2 = 56.66. 

4 . Data have been gathered showing the points scored on a mental test by 
290 prospective employees and the per cent of standard production 
attained by these same 290 persons after being employed. 1 The following 
statistics were obtained: 

Mental test: mean = 43.33 pts. 

<r = 9.25 pts. 

Productive ability: mean = 92.02% 
o’ = 24.47% 

(а) Compare the relative dispersion in mental test and productive ability. 

(б) What factors, other than mental level, may have affected dispersion 
under factory conditions? 

1 Wembridge, “Experiment and Statistics in the Selection of Employees,” 
Journal of the American Statistical Association , March 1923, p. 605. 











98 


Measures of Dispersion 


V 


6. Read and abstract the article "Variability,” Journal of Educatiohal Research, 
vol. 4, no. 3, pp. 221-26. 

6 . Find the median for Table 26. 

7. Find 2, a z% MD, and Q for the following distribution. 


mid-x 

2 

4 

6 

8 

10 


1 

4 

6 

4 

l 


8. Show that (8) may be written as follows: 

- ( Ex ) 5 ! 1 ' 1 

J 

9. If the variates are all equal, say each r» = k, show that f-k and a = 0. 

10. For a set of ungrouped data it is found that N = 16, = 480, £x 2 = 

15,735. Find 2 and <r x . 

11. Find the variance of the following data. 

5.7 6.2 6.5 6.0 6.3 5.8 5.7 6.0 6.0 5.8 

Ans. o m * = .064. 

12 . Prove the identity: 

Cc, - 2) 2 + (*i - 2)* + • • • + {X N - 2)» 

= (*i* + * 2 2 + ■ • • + x%) - NV. 

13. Compute the mean deviation (from the mean) for the following data: 


X 

2 

4 

6 

8 

10 

17 

f 

1 

6 

10 

7 

2 

2 


Ans . MD = 33/14. 

14. Verify the identity (where 2 is the mean of x\ and z a ) : 

(xi - 2)* + (x 2 - 2) a =*i(*i - x % )\ 
and thus show that, for two variates, 

I Xi - Xt I 


16. Verify the identity (where 2 is the mean of x i, xi, z%): 

3 

3(xi — Xi) 2 + (xi + x% — 2xs)* *= 6£(a* — #)*. 

l 
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9. The Standard Deviation of the Combination of Sets. The 
following theorems involving a are interesting in themselves and 
have useful applications. 

The relation M2 = v* — vi 2 is true in a more general sense than we 
have previously used. Its generalized meaning will be revealed in 
our first theorem. 

Theorem I. The second moment about the mean equals the second 
moment about an arbitrary point P(x o, 0 ) minus the square of the dis- 
tance between the mean and P. 

Stated in symbols the theorem may be clearer. Suppose we have 
a set of N variates whose mean is x. Graphically, x is a point on the 
x-axis. Then if P is any other point on the 2-axis, according to 
Theorem I we have 

(9) 5> - *) 2 = Jf E (* - *o) s - (Z - So)*. 

To prove this relation may write 

(x — I) = (s — x 0 ) — (Z — x 0 ). 

Then 

~ *)* = jj H K* ~ *o) - (Z - *o)]*, 


the right member of which simplifies into the right member of ( 9 ), 
The generality of the theorem consists in extending the original 
definition of v 2 and vi so that they refer to moments about any point 
P on the x-axis (except 2), and not merely about zero. Thus now, 


V2 


- Zo)*. 


If we take x 0 = 0 we have the original defini- 


tion of V2. Also, when P moves 

back to zero, we see that v\ be- p . 

comes 2. In other words, the orig- — — [ ■ > * 

inal definitions of the v’s are merely v i 

the more general definitions when 

zero is the value chosen for the arbitrary point. (See (la) of Chap- 
ter IV.) 

Theorem II. The sum of the squares of deviations of the variates 
from their mean is less than the sum of the squares of the deviations of 
the variates from any other value . Therefore <j is less than any similar 
“ root-mean-square .” 
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The proof consists in showing that m < v t which is left to the 
student as an exercise. 

Theorem III. Let there be one set of n\ variates Xu (i =* 1, 2, • • •, 
nj) and another set of n 2 variates x 2i (i = 1, 2, • • • , n 2 ) and lets be the 
mean of the combined sets (Theorem VIII, Chapter III). The vari- 
ance <r 2 of the set formed by the combination of these two sets is given by 
the following formula: 

m na 

(10) No* = £ (x u - *) 2 + £ (**< - x y 

i i 

where 

N = »i + n 2 . 

Proof: The proof consists in showing that 

ni n* ni +na 

£(®li — X ) 2 + £(z 2< - x) 2 = £ (Xi — z) 2 

1 X 1 

which is left as an exercise for the student. 

The above theorem is not very important in itself but it is useful 
in proving the next theorem which gives the relation between the 
variance of a composite set and the variances of sub-sets. 

Theorem IV. Let the frequency , mean , and standard deviation be 
denoted by n h X\ , and <ri for one set of variates and by n 2 , x 2 , and <r 2 for a 
second set. The variance a 2 of the composite set is given by the following 
relation: 

Na* = TiifTi 2 -(- n 2 <T 2 2 -f- nidi 2 -f- n^jd^ , 

where N = ni + n 2 , di = x x — x, d 2 = x 2 — x, and x is the mean of 
the composite set . ^ 

Proof: For the ni set, x may be regarded as an arbitrary point P. 
Hence by Theorem I we have 

1 m 1 m 

— £(*i< — 2i) 2 = — £(*i< - s) 2 — (£i - sy. 

U\ 1 U\ 1 

p 

Multiplying through by ni this becomes 

(11) ni<n 2 = £(xh - f) 2 - nidi 2 . 

i 

Similarly for the n* group we have 

(12) n 2 <r 2 2 = £(z 2 < - 2) 2 - njdj*. 

i 
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Adding (11) and (12), and using (10), we obtain 

nwi 2 + n 2 <r 2 2 = No 2 — nidi 2 — ttjjcfe 2 . 

Hence, 

(13) No 2 = U\<t\ 2 + n 2 a 2 2 -f- nidi 2 -f- nid 2 2 . 


For k sets combined into a single set we can generalize (13) into 
the following relation : 

(14) Ncr 2 = 


where N = and d> = x % — x. It is interesting to observe that 

1 k 1 

— 5 "jidi 2 is the variance of the means of the sub-sets. Thus we have 
N i 

the important relation 

1 k 

(14a) a 2 = — En^ 2 + ff** 2 

A x 


which shows that the total variance may be broken up into two parts, 
one of which is the weighted mean of the variances in the sub-sets 
and the other is the variance of their means. These two parts are 
sometimes called the average variance urilhin classes and the variance 
between the means of the classes. They become very important in 
the “ Analysis of Variance ” (which is explained in Part II). 
Corollary I. Equation (13) may be written in the following form: 

(15) No 2 = ni(ai 2 + xi 2 ) + n 2 (a 2 2 + x 2 2 ) — Nx 2 . 


Proof: Since 

Tiidi 2 = ri\(xi — x) 2 — UiX\ 2 — ( 2n\XiX — nix 2 ) 
and 

n*d 2 2 = n 2 (x 2 — x) 2 = n 2 x 2 2 - (2 n 2 x 2 x — n 2 x 2 ) 

the proof consists in showing that the sum of the terms in the end 
parentheses above reduces to Nx 2 . Rearranging these terms their 
sum is 

2x(niXi + n 2 x 2 ) — x 2 (ni + w 2 ), 


which by Theorem VIII (Chapter III) becomes 


2 ZN2 - x 2 N = Nx 2 . 
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Generalizing for k groups, (15)' becomes 

k 

(16) No* = £>,(*<* + S { *) - NS*. 

1 

Corollary II. Equation (13) may also be written in the form: 

(17) N<t 2 = riiffi 2 + ri2<r2 2 H — (xi — X 2 ) 2 . 

The proof consists in showing that 

nm* 

nidi 2 + n*I 2 2 = — - (xi - x 2 )\ 


This is left as an exercise. 

For purposes of computation, (17) may be more convenient than 
either (13) or (15) because it does not require x, but it does not lend 
itself to a generalization for k sets. Generalizations may be useful 
both for computing and for theoretical purposes. Formula (14) is par- 
ticularly useful in developing the theory of a later section. 

For convenience, the formulas of Theorem VIII, Chapter III, are 
repeated here: 


(18) 


niXi + n 2 x 2 

, 

ni + rii 


(18o) 


1 k 

x - — £n,z„ N = 

J\ 1 



Theorem V. Consider k sets. Suppose the second moment of each 
set is taken about the mean } x } of the combined sets. Let r 2 (t) represent 
this moment for the ith set. Then the variance a 2 for the combined sets 
is given by 

k 

(19) Na 2 = nii'2 <l) + W2^2 (2> + • • • + nkVi^ = T"! n%v g (t) 

1 

* 

when n% represents the frequency in the ith set and = N . 

Proof: We may write (10) in the form 

No 2 = niv 2 (1) + n 2 r 2 (2) . 


So, generalizing this form of (10) for k sets, we obtain (19). 

The next theorem gives the standard deviation of the distribution 
formed by the first N integers, that is, when x = 1, 2, 3 • • •, N. It is 
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useful in cases when the variates are recorded not by measurements 
but by their respective positions when ranked in order with respect 
to some character or property. 

Theorem VI. The standard deviation a of the first N natural num- 
bers is given by 


( 20 ) 



Proof: By a fundamental definition we have 



and by Theorems IV and V of Chapter III, this becomes 
a 2 = |(V+l)(2V+l)-l(V+l)2 


which reduces to 


m - i 
12 


f 


whence we obtain (20). 

10. Graphical Representation. We have shown that, if certain 
statistics are given for two sub-sets, 


Subsets 

rt\ 


<rt 

n 7 


<rt 

Composite set 

i 

X 



the corresponding statistics for the composite set may be obtained 
by means of (13) and (18a). We have been thinking of these statis- 
tics as relating to distributions in the ^-direction. The following 
diagrams show how the means and standard deviations of three such 
distributions may be represented geometrically by the points whose 
ordinates are zero- and whose abscissas are, respectively, x lf (xi =fc <ri); 
% 2 f (x 2 ± a 2 ); and 5, (x ± <r*). The points are plotted on three 
different axes to avoid confusion, but they are to be thought of as 
being referred to the same origin and plotted on the same scale. 
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It should be clear that Theorems I-IV (§9) will apply to distribu- 
tions in the ^-direction as well as in the ^-direction. In particular, 
it is obvious that (13) and (18a) hold if we replace x by y . Then 
the graphical representation of the means and standard deviations 


Subsets 

Composite set 

ni 

n 

i 

N 

5* 

y 

X 

y 

n 

a 

i 

<r„ 


is shown below. 



It will be helpful to discuss one mdte notion in this connection. 
Suppose the y composite set is made up of k sub-sets and the means 
5i, y 2 , • • • , yk, of these sub-sets are plotted on the y-axis as shown 
by the labels on the left side of the axis in the figure on page 105. 

We will denote the standard deviation of these means by <rp.. 
Then the points y, (y ± <r*.), and (y ± <r v ), may be plotted as shown. 
We would expect less variability among the means of the sub-sets 
than among the y’s of the composite set, that is, that <rp { would be 
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less than oy It is clear that (14) and (14a) hold when x is replaced 
by y. 



I 


4 - 

A grasp of these notions will help in the analysis of Table 27 which 
the student is asked to make in problems 5 and 6 below. 


Exercises 

1. (a) Show that v\ = 77 fa — rro) - (x — xq). 

N 

(b) Derive equations (9) and (13). If 7ii = n 2 , what does (13) reduce to? 

2 . Given the following information about two sets of data: 

I II 

rii = 20 n 2 = 30 

xi = 25 = 20 

<n 2 = 5 = 4. 

Find the mean and variance of the composite set. 

3. Think of the two groups in Exercise 2, page 97, as combined into a single 

set. 

(«) Find the mean of the combined set by formula (18). 

(1 b ) Find the standard deviation of the combined set using result of (a) and 
formula (13). Ans. 2 = 68.72, a - 12.45. 

4 . Using Theorem VI find the mean and standard deviation of the first 25 

natural numbers. 

5. Consider Table 27. Observe that the first and last columns form a frequency 

distribution and that columns (1) to (8) are subdistributions whose totals 
add up to N = 260 which is also the sum of the last column. Let m 

represent the frequency in the ith column and answer the following 

8 

questions: nt = ?, = ?, n® = ?, X n » = ? bet y% and *»* represent 

mean and variance in the fth column. Find the mean and variance of 
each of the columns (1) to (8), first in v units where v = (y — 85)/ 10. 

' Check your answers with those given at the bottom of the table. 




106.12 191.83 246.48 283.63 257.65 294.51 222.53 71.43 = 303.11 
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6. Using formulas (18a) and (14) find the mean y and variance a y * of the total 
distribution in Table 27 and check your answers with those given at the 
bottom of the last column. 


Hint . The student will observe that the means, y Xt of the columns in 
Table 27 are the values denoted by y in Exercise 3, page 97. The weighted 
mean of these mean values is the mean of the whole tabic. That is, 
from (18a), 


1 k 
V = T7 
N i 


= 87.31. 


The answer 56.66 (Exercise 3) is the variance, <rp< 2 , of the means of the col- 
umns of Table 27 and is not to be confused with the variance <r y 2 of the whole 

table. In using (14), a 2 is the variance of the whole table, <r* 2 is the vari- 

k 

ance of the zth column, and the expression equals Nay* where 

1 

ay* is the variance of the means of the columns since now d x = — y. 

7. In Theorem V (§9) show that 

= o-Jf -f di 1 . 

Hence prove that (19) may be derived from (14) by showing that (14) 
may be written as follows: 

Na 2 = £n t (a \ 2 + d t 2 ). 

1 

8. (a) Derive the following relation from (18a), 

= — r • 

niL » — 2 , J 

What does this formula become when k = 2? 

(6) Derive the following relation from (15), 

<7,2 = ^ £jV(<r’ + 2*) - n 2 (irr + 2j s )J - 2,*. 

9. In a certain distribution of N = 25 measurements it was found that $ =* 56 

inches and a ® 2 inches. After these results were computed it was dis- 
covered that a mistake had been made in one of the measurements which 
was recorded as 64 inches. Find the mean and standard deviation if the 
incorrect variate, 64, is omitted. 

Hint . Let n x = 24, ni = 1 . Then = 64 and <r 2 = 0. To find Si and 
cri use formulas in Exercise 8 above. 

10. If two or more variates are deleted from a distribution for which N , and a 

are given, show how to compute the mean and variance of the remaining 
variates. 
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11. Consider a composite set consisting of k sub-sets and let <r,* and m denote, 
respectively, the variance and number of variates in the ith sub-set, 

and N = 

1 

(a) If the sub-sets have equal means, show that the variance of the com* 
posite set is given by 

1 k 

<r 2 = 77 

N i 


( b ) If the sub-sets each contain the same number of variates and have equal 
means, show that 



CHAPTER VI 

TYPES OF DISTRIBUTIONS. THE NORMAL CURVE 

1. Skewness and Kurtosis. The shapes of frequency distributions 
are not all alike. Unimodal distributions may differ in two ways 
with respect to form. These differences can be described more easily 
if we think in terms of frequency curves. The curve may be quite 




symmetrical, or it may be vskew, bulging out on one side more than 
on the other. Secondly, the top of the curve may be narrow and 
peaked, or it may be somewhat flat giving a mound-shape effect. 

The mean and standard deviation are not sufficient to detect these 
characteristics, so we need other measures to describe them. Con- 
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Table 28 



A 

B 

► C 

u 

/ 

/ 

f 

-3 

0 

1 

0 

-2 

3 

1 

1 

-1 

6 

5 

10 

0 

7 

11 

6 

1 

6 

5 

5 

2 

3 

1 

2 

3 

0 

1 

1 

Sums 

25 

25 

25 


sider, for example, the three distributions of the weights (in class 
units) 1 of different breeds of mice 120 130 days old given in Table 28. 
Experiments on mice are important in cancer research. These dis- 
tributions are, however, some- 
what fictitious, being adapted 
from some actual data for pur- ^ 
poses of illustration. 

The student may easily verify 
that for each of these distribu- 
tions we find the same mean and 
standard deviation, namely, E 


-2 -1 


u = 0, 


= 1 . 2 . 


x 


-3 -2 -1 


X 


One may see from their his- 
tograms that these distributions 
are essentially different in shape 
even though they all have the 
same mean and standard devia- 
tion. These differences would 
be more pronounced if N were 
so large that the shapes ap- 
proached a regular and smooth 
form. Such a large value is 
called the “ population ” or “ universe ” and the value of N that 
we usually have at hand is a “ sample.” 

1 Neither the original unite nor the class interval need concern us here. 


-2 -1 


0 1 
Fig. 20 
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Lack of symmetry in a distribution is known as “skewness.'' 
This characteristic is measured by <*3. If a distribution is symmetrical 
as = 0, but a3 may be positive or negative depending upon whether 
the long tail of the distribution extends to the right or the left of the 
mean. (See Figure 18.) 

Figure 19 exhibits curves with different degrees of flatness or 
peakedness. The flatness that we are now describing is in the 
neighborhood of the mode and is not to be confused with the flat- 
ness of a curve as a whole which is due to spread or dispersion. 
The curves in Figure 19 all have the same spread. So their flatness 
depends upon the relative amount of material in the vicinity of the 
mode. This characteristic of a curve is called “ kurtosis ” and is 
measured by a 4 . By the calculus it can be demonstrated that a 4 = 3 
for a certain type of distribution which is called the normal curve. 
A frequency curve is said to have positive kurtosis if a A > 3 and 
negative kurtosis if a 4 < 3. It seems, however, that any combina- 
tion of kurtosis and peakedness may occur. 1 The values of a 3 and 
a 4 computed for an observed distribution are useful in selecting the 
curve which will best represent the type to which that distribution 
belongs. 

Both a 3 and a 4 are abstract numbers and therefore skewness and 
kurtosis in different distributions may be compared by these meas- 
ures. Therefore our definitions are 

. v f a 3 is a measure of skewness, 

' ' \ ct 4 is a measure of kurtosis. 

For an unsymmetrical distribution the distance between the mean 
and mode majr be used to measure the degree of asymmetry or skew- 
ness, because the mean and mode coincide in a symmetrical distribu- 
tion. Since we wish any measure of skewness to be a pure number, 
we would express this distance in units of the standard deviation, 
thus (mean — mode)/tr. Now it happens that there is a certain 
curve known as Pearson's Type III which is used to represent certain 

1 A Common Error Concerning Kurtosis — I. Kaplansky. J . Amer. Stat . 
Assoc., vol. 40, p. 259, June 1945. In this connection, Professor I. W. Burr 
comments: “ The shape of the hump of the curve has less influence on a 4 than 
does the length of the tails. In Figure 19, the curve with a 4 = 4.5 should have 
the longest tails.” 
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skew distributions, and it can be shown by higher mathematics that, 
for this curve, 

. . mean — mode at 

(2 , 

So this relation 1 may be used as a formula for obtaining the approxi- 
mate mode. 

Exercise 

Find a« and 04 for each of the distributions A , B, and C, in Table 28. 

2. Frequency Curves. As the student extends his experience he 
finds several types of distributions. It is important in certain prob- 
lems to differentiate between them. Differences in type lead to the 
study of frequency curves. The^c are several standard curves to 
represent the different types of distributions that arise in practical 
statistics . 2 Each of these is specified by a mathematical function 
V = /(z) where /(x) is a general symbol for any function of x. It is, 
of course, a different expression for each of the different curves. 



Such functions are also called distribution functions. A complete 
discussion of this subject belongs to the field of advanced statistics. 
However, there are some simple concepts relating to frequency 
curves which will be useful in our work. 

If a frequency curve is used to represent a given distribution, the 
total area under the curve corresponds to the total frequency N , 

1 Because of this relation some writers use as/ 2 as a measure of skewness 
instead of a a . Also some authors adopt a different convention as to sign, defining 
skewness as negative when the mean is greater than the mode. 

* See Chapter III, Part II. 
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and therefore the partial area under the curve between the ordinates 
erected at x = a and x = b (Figure 21) represents the number of 
variates with measurement or character between a and b. The limits 
between which the theoretical distribution ranges are denoted by 
h and h- It is often convenient and causes no loss of generality to 
suppose that the total area under the curve is unity or 100%, in 
which case the partial area between a and h represents the percentage 
of variates having the given character. 

In mathematical language the “ area under f(x) between a and b 99 
is called the “ integral of f(x) from a to 6,” and is denoted by the 
symbol 

J fix) dx. 

However, we will abbreviate this symbol and use merely j* to de- 
note such an area. 

Without attempting to be rigorous, we may say that the total area 
under the curve is the limit of the area of the appropriate histogram 
whose rectangles have bases Ax and altitudes f(x ), as Ax is taken 
smaller and smaller and approaches zero. Thus 


f fix) dx = lim ^2 fix) Ax . 

«/ Ar— ►O 

The integral sign J* is a conventionalized S and denotes the sui'n 
of elements of area with bases dx and altitudes y — f(x). The letters 
written at the top and bottom of f denote the range over which 

the sum is to be taken. Therefore the notation / y dx or / f(x)dx 


represents the area which is bounded by the curve y = fix), the 
ordinates at x = a and x = b, and the z-axis. (Figure 21.) 

The integral of y = fix) from h to k denotes the total frequency N. 
Therefore, 


N = 



Hence, the proportion of variates having some character x, such that 

a < x < b, is given by ~ . If is taken as unity or 100%, then 

. N Ja 
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denotes the percentage of variates having the given character. 


The integral represented by this symbol also denotes the probability 
that a variate chosen at random from the universe y = f(x) will 
have a value between a and b . 

3. The Normal Curve. Perhaps the most important of all fre- 
quency curves is the so-called normal 1 curve whose equation may be 


written 


(3) 


y 


= Ke- hHx ~ m)i 


where K ) h 2 , and m represent numbers whose significance will be 
explained presently. The curve is bell-shaped and is symmetrical 
about the line x = m. It was first discovered by a famous French 
mathematician, Do Moivre, over uvo hundred years ago and pub- 
lished in 1733. He obtained it while working on certain problems 
in games of chance which were proposed to him by the gamblers of 
his day. Because of this origin and because the data from certain 
coin- and dice-throwing experiments closely approach it in form, it 
is often called the normal probability curve. Actual statistical use 
of the normal curve began with the work of the famous mathematical 
astronomers, Laplace (1749-1827) and Gauss (1777-1855), each of 
whom derived it independently and presumably without knowing of 
De Moivre’s treatment. 2 They found that it represented very well 
the errors of observation in the physical sciences. For this reason 
it has been called the normal curve of error, where error is used in 
the sense of a deviation from the true value. Since that time experi- 
ence has shown that it serves quite well to describe many of the dis- 
tributions which arise in the fields of biology, education, and sociology. 
Much of the theory of statistics is built around it. 

The calculus is required to define the moments of a theoretical 
distribution specified by a frequency curve y — f(x). (These defi- 
nitions are given in Part II.) It turns out that the mean of the dis- 
tribution specified, by (3) is m and %ts variance is l/{2hr). The 
constant K is determined so that the area under the curve shall have 
some relevant value. In describing an observed distribution by 

1 The term “ normal ” used here should not be interpreted to mean that other 
types of distribution are abnormal. 

2 For a more extensive history see (a) ‘ 1 Bi-centenary of the Normal Curve, 

Jour. Amer . Statistical Assoc. t vol. 29 (1934), pp. 72-75. ( b ) “Mathematical 

Statistics “ (Cams Monograph) — Rietz, Ch. 3. 
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means of a normal curve, vve wish to have the number of area units 
under the curve (3) equal to the number N of observed variates. 
When this condition is imposed, K = Nh/V w and we see that K 
depends also on h . If we adopt the same notation 1 here as we used 
for an observed distribution, vve have 



h 2 =—> 

2<rJ 


K = 


N 


Upon making these replacements, (3) becomes 


(3a) 


N 

<r x V& 


g— (x—x)*/2crar* # 


4. Standard Form. The letters it and e represent numbers which 
always have the same values (see §1, Chapter I). But each of the 
letters m, h , and K may take on different values in different situa- 
tions. Such constants are called parameters, and (3) really rep- 
resents a family of curves. Similarly, in (3a), x, a, and N are 
parameters. For assigned values they determine, respectively, the 
position of the curve along the x-axis, its steepness, and its “ size” 
but they do not have anything to do with its fundamental charac- 
teristics ( i.c ., those properties which differentiate it from all other 
curves). In order to study these characteristic properties it is 
convenient to represent the curve by an equation which will be in- 
dependent of the parameters ; in other words, to eliminate them from 
the 4 equation by a transformation. This is accomplished by con- 
sidering the total area under the curve as unity, taking the origin at 
the mean, and using the standard deviation as the unit of horizontal 
measurement. In mathematical language this means that we set 
N = 1, and t = {x — x)/a x . We will denote the resulting function 
by 4>(t) y that is, 


(4) 


m = 


^ e -p/t 


which is called the standard form of the normal curve. 

A variable, t, which is distributed in accord with (4) is said to be 
normally distributed with mean zero and unit standard deviation. 

Just as coordinates of points on the curve are denoted by (x, y) ' 

1 In the theory of sampling, Part II, it is necessary to distinguish the moments 
of a sample from those of the parent universe by the use of different symbols. 
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in the case of equation (3a), so in equation (4) t refers to abscissas and 
0(<) refers to ordinates. The relation between the two systems of 
coordinates is given by 

(6) x = h + * 

for abscissas, and 

(6) y = -<>(0 

it 

for ordinates. Equation (6) follows from (3a) and (4). If the area 

under the curve is taken as unity, then y = - that is, <p(t) = ay. 

a 

This says that since the abscissas are compressed by a in changing 
from arbitrary units into standard units, so the ordinates must be 
stretched by a if the area under the curve is to be the same in the two 
scales of measurement. 

5. Tables of Standard Ordinates and Areas. One of the reasons 
for writing the equation in standard form is that the ordinates and 



areas may be tabulated once and for all. These tables are given in 
the Appendix. We see from (4) that <£( — 0 = </>(+£), i-e-, the ordi- 
nates for negative values, of t are the Same as for the corresponding 
positive values of /, and the curve is symmetrical about the ordinate 
at t = 0. Therefore it is necessary to tabulate values of 4>(t) for 
positive V s only. Equation (4) may be graphed by plotting the 
points corresponding to a few well chosen values from the tables 
and drawing a smooth curve through them. (Figure 22.) 

The curve approaches very close to the horizontal axis at each 
extremity but is asymptotic, that is, it does not quite touch the axis 
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no matter how far extended. Wo say its limits are at — <*> and 
+ oo . Although the infinite abscissal range is never met in practice 
it may be characteristic of the 14 universe ” from which a given 
distribution is a sample. Therefore, this infinite feature is useful in 
theoretical investigations. Moreover, even in representing observed 
distributions the infinite range causes no practical difficulty because 
the curve comes down to the horizontal axis very rapidly beyond 
t = ±3. The combined area at each extremity beyond t = ±3 is 
only .27 of 1% of the total area under the curve. 

Partial areas between ordinates erected at various values of t, say 

between t = a and t = b , are denoted by J* . Thus the area from 




-I o 


-2 O 


t = 0 to t = 1 is given by J* = .3413. (See Table I, Appendix.) 

Since the total area under <t>(t) is taken as unity the area on either 

side of t = 0 is 0.5 and it is only necessary to tabulate the areas j* 

for positive values of t. Thus the area from t = — 1 to t = 0 is equal 
to the area from t = 0 to / = 1. In symbols tins would be stated 
as follows: 

l-s:- 


Any other areas required may be found by an appropriate addition 
or subtraction of tabular values. Tor example, suppose the area 

below t = —2 is required. This is denoted,by / . Now the area 

— oo 

from — qo to — 2 equals 0.5 minus the area from —2 to 0. And the 
area from — 2 to 0 is the same as from 0 to 2. That is, 
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Both areas and ordinates for decimal values of t between tenths may 
be approximated by interpolating between the values given in the 
tables. 

The illustrative examples following §6 will help the student 
become familiar with the tables. He should verify the answers and 
draw a simple sketch of the curve showing the ordinates or areas in 
each case. 


The symbol f denotes a cumulative relative frequency, i.e. 
the percentage of the total frequency N which is less than t. In order 
to find values of / from the tables, for assigned values of t , the 

CD 


student should observe (from a figure) that 



the plus or minus sign to be used according as t is positive or negative. 

6. Properties. A knowledge of the properties of the normal curve 
is essential for an intelligent use of the curve in practical statistics. 
A demonstration of some of these properties is beyond the scope of 
the present discussion although quite simple in the calculus. The 
following properties are the most important and interesting. 

1. The mean, median, and mode coincide at t = 0. The height 
of the maximum ordinate in standard form is 1/V^ because when 
t = o, = l/V^TT = .3989. 

2. Since the standard deviation is the unit of measurement along 
the horizontal axis, <r x — 1 in the t scale. Any t value may be con- 
verted into the corresponding x value by (5). In the vertical direc- 
tion N/<r is the unit of measurement and any <t>(t ) ordinate may be 
converted into y units by means of (G). 

The area under (3) in the range from x = c to x = d is denoted by 



If ( = a and t = b denote the corresponding range in standard units, 
then 


( 7 ) 



dt 


denotes the corresponding area, in standard units, under (4). It is 
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shown in the calculus that dx = <r x dt. Therefore from (6) we have 

(8) f ydx = N f <t>(l) dt._ 

«/e Ua 

If the interval goes from x = c to x = d, (8) says that 

(9) Frequency over (c, d) = N j* 
where 

(10) ’ a = (c — *)/«r„ b = (rf - x)/o*. 

This merely means that the percentages (relative frequencies) ob- 
tained from the tables may be converted into numbers (frequencies) 
by multiplying the percentages by N. 

3. The curve changes from concave to convex at / = itl. In the 
z-scale, referred to the origin of x, these points are at x = x d= a x . 
They arc called points of inflection and their position is important 
in making an accurate drawing of the curve. 

4. The standard deviation is approximately 25% greater than 

the mean deviation. More precisely, MD = a^- = .798a. = 

1.2533.) 

5. The quartiles, Q\ and ft, are equidistant from t = 0 and there- 
fore from the mean. By definition 

ft is that value of t for which / - .75, 

— 00 

t.e., for which f = .25. From the tables this is ( = .6745. There- 
to 

fore in arbitrary units, 

ft = X + .6745o*i cl nd ft = jf — .6745 <j , x* 

6. The quartile deviation (semi-interquartile range) for a normal 
distribution will be denoted by E. Its value is 


E = 


ft - ft (3c + .6746o*) - (jc - .6746<r) 

2 


= .6746<r. 


In standard units this is s = E/a = .6745. 

7. The quantity E (or s) has a significance in probability theoiy. 
If a variable x is distributed according to the normal curve, the 
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probability is one half that a variate selected at random will have a 
value between x — E and x + E. The reason for this statement is 
that 50% of the variates have values within this range. ' E is com- 
monly, though somewhat ambiguously, called “ probable error.” 

8. E is in units of x whereas s is a value of t. that is, 8 is the value 
t = .6745, and E is the value x = .6745<r x . Just as <r x may be used 
as a yardstick in scaling off a distribution on either side of the mean 
(§6, Chapter V), so may E or s be used in a similar manner. When 
thinking of them in this way it is useful to regard E as a yardstick 
about two-thirds the length of a x . The following table gives the 
end-points of certain intervals in t, x', and x units, respectively, where 
t = x 9 /<r z and x' = x — x. 


End Points of Certain Intervals in t, s', x 


When a is the unit 

When E is the unit 

t 

if 

X 

t 

s' 

X 

0 

0 

X 

0 

0 

2 

±1 


X zt <r 

± .6745 

± .6745<r 

x ± .6745 a 

±2 

±2* 

X ± 2a 

±1.340 

±1.349<r 

2 ± 1.349<r 

±3 

±3<r 

i ± 3a 

±2.023 

±2.023<r 

x ± 2.023<r 


The percentage distribution of area under the normal curve is 
given (approximately) in Figure 23 where o x is the unit of measure- 
ment along the horizontal axes and in Figure 24 where s is the unit. 
The percentages given in the figures may be regarded as abridged 
tables. Of course the tables in the Appendix will ordinarily be used 
in problems. 

With reference to Figure 23, it is sometimes said that if values of x 
are normally distributed, the probability that a value chosen at 
random will fall within the range X\ < x < x 2 , where X\ = x — <r x 
and x 2 = x + a X) is .68. 

9. Astronomers and physicists have called h the “ modulus of 
precision.” From the relation h = l/(\/2<r), it is evident that h 
increases as <r decreases. And as h increases, the curve (with N and 
m kept constant) becomes narrower in the neighborhood of m and 
in this sense h measures the closeness of the values of x to their mean. 
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- 3 a -2a -a o (T 2a 3<7 -4s -3s -2s -* O s 2s 3s 4s 

Fig. 23 Fra. 24 


10. The curve is symmetrical and a 3 = 0. The fourth moment 
about the mean is equal to three times the square of the second 
moment about the mean, i. e., m = 3m 2 2 and therefore a 4 =■ m/n 2 = 3. 


Examples 

1. Find the ordinates of 0(0 for (a) t = 2.3, (b) t =■ —2.3, (c) l = .67. 

Solutions from the tables in the Appendix: 

(a) 0(2.3) = .02833 

(b) 0(- 2.3) = .02833 

(c) 0(.67) = .31874 

2. Find the following areas under 0(0 and use the integral notation: 

(a) From t = 0 to t — 3.00 

(b) From t — 1 . 5 to t = 2.5 

(c) From t = —2 to / = 1.3 

(d) From t = 0 to J = 0.6745 


Solutions from the tables: 

f 3 

(a) The required area is given by / which we find to be .49865. 

Jo 

J pi.5 

= .43319, and from / = 0 1 

o 


X 2.5 

= .49379. Therefore the required area is 

X i. 5 

« .0606. 


£ .5 /* 2. 5 

Jo 


(c) Since the area from t = 0 to t = —2 is the same as from t =* 0 to t 
+2 we have 


r. r+r 

«/-2 t /0 t /0 


= .47725 + .40320 = .88045. 
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(< i ) Here we must interpolate: 



Pt 

For 

t - .67, / = .24857 


Jo 



For 

t = .6745, / = A (say) 


Jo 


Pt 

For 

t = .68, / = .25175. 


Jo 

Therefore 

A - .24857 .0045 


.25175 - .24857 “ .01 

whence 

II 

s 


3 . Show that for equation (3), the percentages of area outside the given ranges 
are as stated below : 

Above x + a = 15.87% 

Outside x ± a — 31.74% 

Outside x db 2a = 4.56% 

Outside i ±3a = 0.27% 


Solution: Converting these ranges into t units, and remembering that only 
the positive half of the area under <t>(t) is tabulated and equals .5, we have 


is 5 - 1 ' 


Area above t = 1 

Area outside t = ±1 is 2(15.87%) 


= .1587 
= 15.87% 

= 31.74% 


Area outside i = ±2 is 2 ^.5 — J* ^ = J 
Area outside t = ±3 is 2 ^.5 — ^ ^ = J 


0456 
- 4.56% 


0027 
0.27% 


4 . 


Given N = 1500, x = 75, <r, = 10. If the variates are distributed according 
to the normal curve, (a) find the value of x for which curnf = 800, (5) for 
which cum f = 450, (c) how many of the N variates lie where x < 80? 
Solutions: 


(o) By definition, cum } 
and from (8), 






= .5333. 
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£ 

/. jf = .0333 


whence from the tables, 


t = .083. 


Substituting in equation (5), 


L- 


x = 75.83. 


(6) We have / — 45/150 - .3 and t is negative. 


we have 

whence we find that 




t = -.524 


x = 69.76. 


(c) From the relation t - (x - x)/a x we find that 


From the tables, 


t = .5 when x — 80. 


.69146. 


From (8) we have 


£= 


1500 (.69146) 


= 1037.2. 


Exercises 


1. Find 0(2.65), 0(-1.46), 0(0). 

2. Find t if 0(0 = .1257, .0325, .0034, respectively. 

3. Find the following areas under 0(0, and draw a figure in each case: 


(a) f » /* 12 ’ f 2 ’ f > f ‘ 
JO J- 1.2 «/- oo J 1.2 t/— 1.2 

X .37 /\6745 

’ 

.37 J — .6745 


-.37 «/— .6745 


4 . Find J, given the partial areas: 


2 / = .5, / = .27457, / = .999730. 

Jo c/o J-< 
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6. Verify the percentages given in Figures 23 and 24. 

6. (a) How far from the median of a normal distribution is the first quartile? 

(i b ) In a certain normal distribution 2 = 89 and Qi = 75.61. What is <r»? 

7. For a normal distribution: N = 1000, 2 =* 20, <r* = 2. 

(a) What is E ? 

(b) Find the value of Q t . 

(c) What values of x will include the middle 500? 

(< d ) The middle 75%? 

8* If N = 300, 2 = 75, <r« = 15, for a normal distribution: 

(а) What is the value of the first quartile? 

(б) The third quartile? 

(e) How many variates are between x = 60 and x = 90? 

9* In a college the 8 grades A, A — ; B, 13 — ; C, G— ; D, and F are given. 
On the assumption that mathematical ability is normally distributed, 
how many out of a total of 1000 should receive each grade? Assume 
that x is the boundary between the C and B — grades and that each grade 
interval is .8<r. What range in standard units on either side of x is thereby 
assumed to include all the grades? 

10. What are the percentages of a normal distribution outside 2 d= tv for 
t = 1, 2, 3? 


' 7. Curve Fitting. It should be remembered that a set of data 
collected and presented in the form of a frequency distribution is 
merely a sample of a general type called its universe. Other samples 
from that universe might yield somewhat different frequency distri- 
butions. 

For certain purposes it may be desirable to fit a normal curve to 
a unimodal distribution which is reasonably symmetrical and appears 
to be of the normal type. The theoretical curve idealizes the recal- 
citrant observational data and smooths out the irregularities due to 
sampling fluctuations. 

In fitting equation (3a) to a given distribution, we assume that 

(1 ) The given frequency N represented by a histogram equals the area 
under the curve } and 

(2) The mean and standard deviatidh of the observed distribution 
equal , respectively , the mean and standard deviation of the theoretical 
distribution represented by the curve . 

A normal curve is a mathematical model of a hypothetical uni- 
versl. In identifying such a universe with (3a) only its form is 
specified by the model. The parameters are (usually) unknown. 
An estimate of a parameter by the use of an appropriate function of 
the observed data is called a statistic . Assumption (2) above means, 
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then, that we replace each of the parameters by the corresponding 
statistic. 1 ^ 

The procedure of fitting a normal curve to an observed distribu- 
tion will now be illustrated with the data of Table 21, p. 88. We 
substitute 

x = 47.712 
<r x = 5.772 
N = 1000 

in equation (3), and obtain 

1000 ( 1 - 47 . 712 )* 

'■smvS'' • 


To make use of a table of standard ordinates in graphing this 
equation we transform it into standard units by setting 

x ■ 47.712 

(а) t = — 772 = .173252 - 8.2661 

and write 

(б) y — —<t>(t) = 173.25$(<). 


Appropriate values to assign x in equation (a) are the end-x and 
mid-x values of the given distribution. The use of a computing 
machine in changing x values into corresponding t values is explained 
in §6, Chapter IV. Thus we obtain the values in the second col- 
umn of Table 29. We may then enter the table in the Appendix 
for the corresponding ordinates, </>(/). These are converted into y 
values by equation (5). The curve may then be drawn by plotting 


} It is shown in Part II that a better estimate of the variance in the universe is 
obtained by multiplying the variance of the observed distribution by N/(N — 1). 
Because of this fact some writers, denoting this result by s 2 , define the variance of 
an observed distribution by 


s a = 


AT - 1 i 


£/.(*. - x)\ 


The distinction between the two definitions is not an important one, in till au- 
• thorns opinion, for beginning students who arc learning the descriptive method- 
ology of statistics. And hi curve fitting, the numerical difference is negligible 
because N is fairly large. The distinction is important, however, in the theory 
of small samples (Part II). 
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Table 29 . t = . 17325 * - 8 . 2661 , y - 173 . 250(0 


X 

t 

0(0 

y 

' f/c 

27.6 

- 3.502 

.00086 

0.15 


29.5 

- 3.155 

.00275 

0.48 

0.25 

31.5 

- 2.809 

.00772 

1.34 


33.5 

- 2.462 

.01927 

3.34 

3.50 

35.5 

- 2.116 

.04253 

7.37 


37.5 

- 1.769 

.08344 

14.46 

14.00 

39.5 

- 1.423 

. 14494 

25.11 


41.5 

- 1.076 

.22361 

38.74 

43.00 

43.5 

- 0.730 

.30563 

52.95 


45.5 

- 0.383 

.37072 

64.23 

61.25 

47.5 

- 0.037 

.39806 

69.07 


49.5 

0.310 

.38023 

65.87 

65.75 

51.5 

0.656 

.32230 

55.84 


53.5 

1.003 

.24124 

41.79 

39.00 

55.5 

1.349 

. 16060 

27.82 


57.5 

1.696 

.09469 

16.41 

16.75 

59.5 

2,042 

.04960 

8.59 


61.5 

2.389 

.02299 * 

3.98 

5.75 

63.5 

2.735 

.00948 

1.64 


65.5 

3.082 

.00346 

0.60 

0.75 

67 . 0 

3.428 

.00111 

0.19 
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t-O 

Fia. 25 — Normal Curve Fitted to Histogram Representing Weight 
Distribution of Glasgow Schoolgirls (Table 21) 

The smooth curve is plotted from the points (x, y) given in Table 29. The 
column headed f/c in that table gives the heights of the rectangles in the histo- 
gram, c =* 4. When both the curve and the histogram are to be drawn, it is best 
to draw the curve first so that the presence of the histogram will not prejudice 
one into trying to make the curve fit the histogram. Q. 
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the x and y values. (Figure 25.) The curve should be drawn so 
as to be symmetrical with respect to the ordinate at the mean and 
its points of inflection should be at a distance from the mean equal 
to <t. The student should observe that every 
pair of ( x , y ) values computed in Table 29 
furnishes two points for the graph, each sym- 
metrical to the other with respect to the mean 
ordinate. Both points should be used in 
drawing the curve but only the computed 
points should be left permanently in the graph. 

After the curve is drawn, the histogram for the observed data may 
be constructed. The column headed f/c gives the heights of the 
rectangles on the same scale as the ordinates of the curve. 

8 . Graduation. The areas under the fitted curve and over the 
class intervals arc called theoretical frequencies. Thus in Figure 25 
the shaded area represents the theoretical frequency corresponding 
to the observed frequency which is represented by the rectangle the 
mid-point of whose base is 41.5 pounds. The determination of the 
theoretical frequencies is called “ graduation by the normal curve. ,, 
It is a process of smoothing out the data to fit the curve. The method 
is shown in Table 30 for the data represented by Figure 25. 

In order to enter a table of standard areas we must change the 
end-x values into t values. These are given in the third column of 
Table 30. They are part of the values already computed for Table 29. 

The entries in the column headed A = / are the {cum f)/N 

— CD 

values of the standard curve for the given end-points. The entries in 
the column headed A^l are obtained by differencing the preceding 
column. (See last paragraph of §9, Chapter I.) They are the per- 
centages p = f/N = A A to be expected in the various intervals on 
the hypothesis of a normal distribution. Therefore N&A gives the 
numbers to be expected, that is, the theoretical frequencies. 

The student should study this table ifntil he becomes familiar with 
all the operations involved and what they mean. He should distin- 
guish between the purposes of Tables 29 and 30. 

9. Purpose of a Graduation. If, for the distribution of graduated 
frequencies, the mean, standard deviation, and total frequency are 
found, their values will be precisely those of the corresponding mo- 
ments in the observed frequency distribution. This must be so, 
because these were the conditions imposed in the process of gradu- 


h 


c = bate 
f/c — height 




Table 30 


Observed 

Frequency 

Boundary 

X 

t 

B 

M 

!l 

AA 

NAA = 
Theoretical 
Frequency 


— oo 

— OO 

.0000 



1 




.0025 

2.5 


31 . 5 

“2 N09 

. 0025 



14 




.0117 

14.7 


35 5 

-2 116 

0172 



56 




.0602 

(50.2 


39 5 

-1.423 

0771 



172 . 




. 1553 

155 3 


43 5 

-0 730 

.2327 



245 




2527 

252.7 


47.5 

-0.037 

.4X54 



263 




.2587 

258.7 


51 5 

0 656 

. 7141 



156 




i 674 

167.4 


55 5 

1 350 

.9115 



67 




.0(579 

67.9 


59.5 

2.042 

.9794 



23 




.0175 

17.5 


63.5 

2.735 




3 




.0031 

3.1 


00 

00 

1.000 



Totals 




1.0000 

1000.0 


129 
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ation. Moreover, the observed values of skewness and kurtosis as 
given by a 3 and a 4 will not differ appreciably from the theoretical 
values if the fitting of the normal curve to the observed distribution 
was justified. 

Since the above parameters characterize a distribution, the ob- 
serving student may wonder why a distribution should be graduated 
if the values of these constants are unaltered in the process. 

There are three main reasons why a student should be taught to graduate a 
curve. The first, and least important, has to do with the use of a smooth curve 
in place of a jagged sample. The second, and most important, is that it is 
necessary for the mathematical development of statistics that the mathematician 
should be told what assumptions he may make. These; usually depend on the 
types of frequency curves which can be depended on to fit phenomena. . . . 
A third reason, intermediate in importance between the other two, is that, in 
testing a priori theories in various fields, it is oflen necessary to test the efficacy 
of the frequency distributions which are results of these theories. 1 

The second and third of the above reasons may seem somewhat 
abstruse, but it is not easy to give completely satisfactory explana- 
tions of them at this level of exposition. About all we can say at 
this time is that the distribution of variation of a variable x about its 
mean value is a fundamental statistical concept and in certain theo- 
retical investigations it is very important that we have mathemati- 
cal functions which are capable of representing such distributions. 
This is particularly true in sampling theory which will be discussed 
in Part II. 

The first reason is more readily understood. Occasionally in 
practical problems it may be desirable to use the theoretical fre- 
quencies obtained by graduation in place of the observed data which 
probably contain irregularities due in part to grouping, in part to 
sampling fluctuations. We cite here two illustrations. 

Example 1. A company which operates a chain of men’s haberdashery stores 
planned to bring out a new line of about 100,000 light weight sport shirts suitable 
for camping, hunting, etc. The question arose as to the determination of the 
number of each size that should be ordered from the factory. Their previous 
distribution of sizes had not been satisfactory because the demand for certain 
sizes had been different from the number manufactured. Therefore the statistical 
department was requested to recommend the distribution of the proposed order 
according to neck sizes. The solution of the problem hinged upon the availa- 
bility of data giving the measurements of neck circumferences of a large sample 
of men. Satisfactory data were found in the “ Reports of the Medical Depart- 
ment of the United States Army in the World War,” which gave a table of the 

1 Journal of the American Statistical Assoc,, vol. XXVI, March 1931, Supple- 
ment, p. 36. 
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neck measurements in centimeters of 95,102 white troops at demobilization. 
Since these data are tabulated in class intervals which are slightly different from 
the ranges used in standard shirt-band sizes, a slight adjustment was necessary. 
But essentially a normal curve was fitted to this distribution and the graduated 
frequencies were taken as (he number of potential customers for each shirt size. 
The result was quite satisfactory. 

Example 2. A well known and interesting illustration of the desirability of 
smoothing occurs in the census n't urns. The census takers' records show more 
persons alive at ago 30 than at age 29, more at. age 35 than at age 31, mom at 40 
than at 39, etc. This is probably due to the fact that, men (as well as women) 
do not tell their exact ages. A person who is actually 41 or 42 and known to be 
40 or so, says he is 40. The recorded data show artificial bumps at every age 
which is a multiple of 5. Naturally the Census Bureau prefers the smoothed 
results to be observed. The student should not infer that, the curve used to 
smooth these data is the normal type. The il life curve” is a continuously de- 
creasing function. However, the same kind of quinquennial irregularity occurs 
in other actuarial data which do approximate the form of a normal curve. Many 
examples are given in K1 dor ton, Frequency Curves and Correlation. 


10. Probability. A frequency curve is sometimes called a proba- 
bility curve. The link connecting frequencies with probabilities 
has its starting point in the following definition: 

Definition. // out of N mutually exclusive and equally likely 
events , / are distinguished by some properly A, the probability of an 
event bearing the property A is f/N. 

The definition implies that probability is measured by a number in 
the range 0 to 1, the lower limit denoting impossibility and the upper 
limit denoting certainty. 

Since the total area under the curve represented by (4) is unity, 
any partial area denoted by (7) can be interpreted as the prob- 
ability that a value of t selected at random from a normal distribu- 
tion (4) lies between t = a and t = b. 

Example 1. Refer to the data of Table 8, Chapter I. Let us assume that a 
normal curve was fitted to this distribution and that the fit seemed (by visual 
inspection) to be reasonably good, Generalizing on the experience shown in the 
table, the telephone company wishes to estimate the probability that a call (of 
the same type of message as that in the table) will be between (say) 500 seconds 
and 600 seconds in length. 

Solution. Using (10), 

a = (500 - 477.3) /148.5 = 0.15, 
b = (600 - 477.3) /148.5 = 0.83. 

J r*83 

= 0.24. 

0 , 1.5 
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Example 2. Referring to Example 1 above, find the probability that the length 
of a telephone call will differ numerically from the mean of the table by as much 
as 5 minutes. » 

Solution. We find 1 1 1 = 300/148.5 = 2.02. The probability of a deviation 

/» 2.02 

not greater, numerically, than 300 seconds is P = 2 / = 0.96, approxi- 

Jo 

mately. Then the probability of a numerical deviation as large as (or larger 
than) 300 seconds is Q = 1 — 1 J — 0.04. This would be represented graphi- 
cally by the area under the curve outside t — ±2.02. 


11. Probability Paper. 

</>(£) curve are given by A 


The cumulative frequencies for the normal 



As l varies from — «> to + oo , 


A varies from 0 to 1, and for the finite range t = ±3 (commonly met 
in practice) A varies from 0.00135 to 0.99805. (Verify.) Regarding 
A as a function of t, values of (/, A ) from the tables may be plotted 
and the resulting points joined by a smooth curve. 


A 



When graphed on an algebraic scale this curve is the ogive of the 
normal curve. It. is also called the integral curve of As indi- 

cated in Figure 20, the ordinate of the ogive is zero at t = — 

.5 at / = 0, and the ogive approaches the line A = 1 asymptotically. 

Now imagine the vertical scale of Figure 20 stretched in such a 
way that the ogive becomes a straight Jine. The stretching required 
will be greatest around the line A = 0.5 and gradually diminish as 
the distance from this line increases. 

Paper so ruled that the ( t> A) graph is a straight line is called 
probability paper. It is readily obtainable 1 and is convenient for 
many purposes. Thus, by plotting cum f for an observed distribu- 
tion on probability paper, one may observe how closely it approxi- 

1 The Codex Book Company, New York. 
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mates a straight line and hence get an idea of how nearly normal it 
is. One may thus locate graphically the median, quartiles, etc., and 
estimate frequencies between given limits. 

A more complete discussion giving references to writers who sug- 
gested and developed the use of probability paper may be found in 
the Journal of the American Statistical Association, vol. XXVI, June 
1931, p. 178. 

Exercises 


1. Construct throe normal curves on the same axes according to the following 
specifications. Compute ordinates at intervals of .5a from the mean in 
the range x ± 3a. 


Curve 

Ox 

X 

N 

A 

10 

50 

400 

B 

10 

50 

800 

C 

10 

50 

1200 


Suggested form for computations: 



t 

<M0 

li 

X 

A 

B 

C 

20 

-3 





80 

3 






2. Construct three normal curves on the same 1 axes according to the following 
specifications. Compute ordinates at intervals of .5a from the mean. 


Curve 

Ox 

X 

N 

A 

15 

50 

1000 

B 

10 

50 

1000 

C 

5 

50 

1000 


Suggestion: 


1 

t 

4>{t) 

V 

A 

B 

C 

A 

B 

C 

5 

20 

35 

-3 





— 

i i 

— 

i i 
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Observe that: 

yc = 2000(0 

Vb = hue = 1000(0 
, 200 

Va = \yc = — 0 ( 0 . 

S. Verify the entries in Tables 29 and 30. 

4* For the following distribution: 

(а) Find the equation of the best fitting normal curve, and plot the curve 
and histogram. 

(б) Find the graduated frequencies. 


wid-x 

2 

4 

6 

8 

10 

f 

1 

4 

6 

4 

1 


6. Graduate the distribution in Table 8, §11, Chapter I. Also find the ordi- 

nates of the best fitting normal curve and plot the curve and histogram. 

8. A distribution of the weekly wages of 906 anthracite mind's showed the 
following results: 

x = $36.13 a 3 = 0.007 

ax = $8.87 a 4 = 3.02 

Assuming a normal distribution, estimate the number of the 906 miners 
who received weekly wages (a) in excess of $45, ( fa ) less than $25. 

7. An urban electric railway company operating a large city subway uses 

thousands of electric light bulbs in its underground stations. On January 
1, 1947, the company put into service 5000 new light bulbs. Let it be 
assumed that these 5000 bulbs will have a mean life of 50 days, a stand- 
ard deviation of 19 days, and that their lives conform to the normal 
curve. 

If January 1 is counted as a full day in the life of the bulbs: (a) How 
many bulbs out of the 5000 new ones would have had to be replaced by 
midnight January 31, 1947? (fa) How many by March 10, 1947? 

8. Which properties of the normal curve may be used as criteria in passing 

judgment on the normality of an observed distribution? Would you say 
that the distributions referred to in Table 23 are approximately normal? 

9. Graph the ogive of the normal curve by plotting values of (t, A) in the range 

t = ±3, (a) on an algebraic scale, (fa) <fn probability paper. 

10* What famous mathematicians 1 names are associated with the normal curve? 
When did these men live? Which of them should most appropriately be 
credited with the discovery of this curve? 

1L (Camp) The standard deviation of a certain set of 100,000 high school 
grades was 11%, and the mean grade was 78%. Assume the distribution 
to have been normal, and, being careful not to confuse percentage in the 
sense of grade with a percentage of frequency, answer the following ques- 
tions: How many grades were (a) above 90%, (6) below 70%? (c) What 
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was the highest grade of the lowest 1000? (d) Within what limits did the 

middle 90,000 lie? (e) What was the semi-interquartile range? 

12. {Camp) Answer all the questions of Exercise 1 1 with reference to a set of 

100,000 grades in which the median was 83% and Q s was 90%. Also 
find ox . 

13. In a certain normal distribution, N — 1000, 2 = 50, o 9 = 10. For this 

distribution: 

(a) Convert the following x’s into the corresponding t’s, 


X 

15 

20 

25 

30 

35 

40 

45 

50 

55 

60 

65 

70 

75 

80 

85 

t. 

















H. 


(b) Find from the tables the values of 4>(t) for the t values in (a). 

( c ) Convert the <t>(t) values obtained in (a) into y values. 

(d) Plot the (x, y) values in (a) and (c) and draw a smooth curve through 
them. 


(e) Find the cumulative relative frequencies, A 



for the values of t 


in (a). 

(/) Difference your results in ( e ) by finding AA. 

(g) Convert the percentages in (/) into frequencies. 

(h) Explain the meaning of your results in (<7) with reference to the figure 
for (d). 

( i ) Find the number of variates between x - 42 and x - 74. 

(j) Find the values of x for which cum f = 250, 600, 750, respectively. 
Given a normal distribution in which N — 800, x = 40, o x ~ 7. Find the 

numerical value of each of the following. 


nt =« 

Qh Qh Qh E, N I 

Jt =0 


15. Suppose N - 5000 variates are normally distributed such that 2 « 50 and 

E = 13.49. Without using the tables find the value of the following: 
quartiles, median, mode, standard deviation, mean deviation, x for which 
cum f ~ 1250. 

16. Suppose there are N values of a variable v which are normally distributed 

with mean = 0 and variance — 25. 

(a) Give the equation of the curve which represents the distribution. 

(5) If there are 793 values between = -5 and v = 0, determine AT. 

(c) What percent of N have values larger than v = 10? 

(d) Determine the value of v for which cum f = .7 5N. 


CHAPTER VII 


CURVE FITTING 

1. Empirical Expressions. The preceding chapters have dealt 
with the description and characterization of frequency distributions. 
We have considered three general methods of description: (1) graphi- 
cal devices, (2) the method involving calculation of averages and 
measures of dispersion, (3) the method which is sometimes called 
analytical. This latter method consists in describing the distribution 
by an equation, and wc; considered only one such analytical expression, 

the normal curve. 

Example 1 . Expectation of Life ‘ at various However, another branch of 

statistics is concerned with 
data which may not be classed 
under frequency distributions, 
but which may be described 
by simple equations. 

When one variable is a func- 
tion of another in applied 
mathematics the mathematical 
relation between them is not 
always known. As we men- 
tioned in Chapter II, the only 
information regarding this 
functional relationship may be 
a set of pairs of values obtained 
by experimental or observa- 
tional means. These pairs of 
values may be regarded as 
coordinates of points and plot- 
ted. In doing so, the values 
of the variable which is regarded as independent are taken as 
abscissas, and those of the dependent variable as ordinates. 

The general problem in such cases is to find, if possible, an analytic 

1 By expectation of life at any age is meant the average number of years lived 
by persons attaining that age, as given in the American Experience Mortality 
Table. 
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expression of the form y = f(x) for the functional relationship sug- 
gested by the data. Equations obtained to fit observed data as well 

as possible are called empir- Exalnple% Yearly Production of Cigarette 
leal to distinguish them in the United States 

from the rational expressions ^ ^ ^ ^ \ 

of pure mathematics which ;00 ... J 1 1 1 

can be derived from reason- 
ing. This general problem 
is called curve fitting. It is 
also sometimes referred to 
as “ smoothing ” the given 
data. 

We will consider three 
types of functions: linear , 
quadratic , and exponential . 

2. Linear Functions. We 
know from algebra that the 
general form of a linear equa- 
tion in two variables is 

Ax + By = C 

where A, B, and C are arbitrary constants. 

When B j* 0, the equation may be. solved for y } giving y = 
— ( A/B)x + C/B which is of the form 

(l) y = mx + k 

and which is the form we will ordinarily use to represent a straight line. 

The special cases where A or B or C are zero is as follows: 

When A = 0, then y = C/B, which is of the form y = k. This is 
a line parallel to the x-axis. When B = 0, the equation takes the 
form x = k which is a line parallel to the y- axis. When C = 0, then 
Ax + By = 0 which is a line passing through the origin. 

The graph of (1 ) is a straight line (which explains the term “linear”). 
A characteristic property of a linear function is revealed at once by 
its graph. This is the fact that the ratio of a change in y to the 
corresponding change in x is constant. Thus, if two points (x x , y{) 
and (x%, y%) are chosen on the line, the value of the ratio 


90 

60 

70 

to 


1 9 23 25 26 '23 


Year 

Billions 

1923 

66.7 

1 924 

72.7 

1925 

62.3 

1926 

92.1 

1927 

93.0 

1926 

to 0.0 


m 


2/2 ~ Vi 

X 2 ~ Xi 
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is independent of the points chosen. This ratio gives the average 
rate of change of any function over the interval Ax = x% — x x . In 
the case of a linear function, m defines the rate of change of the func- 
tion. 

Graphically, m is the slope of the line. It is the tangent of the 
angle of inclination a (alpha) which the line makes with the positive 



point on the line we can write i 
(2) y - y i 


2 -axis. 1 Lines having the same slope 
are parallel, and conversely. 

It is shown in analytic geometry 
that we may obtain the slope of a 
straight line from its equation if we 
solve for y and take the coefficient 
of x. Thus in 2x — y = 5, y = 2x 
— 5 and the slope is 2. 

Conversely, if we know the slope 
of a line and the coordinates of any 
ts equation from the relation 

= m(x — Xi) 


which is called the point-slope form of a straight line. Thus, given 
that (2, — 1) is a point on a line whose slope is 2, the equation of the 
line is therefore y + 1 = 2{x — 2) or 2x — y = 5. 

Or again, remembering that m is defined by a ratio involving the 
coordinates of two points on a line, we can obtain the equation of a 
line if we know any two points which lie on it. From the definition 
of m and (2), we have 

(3) y - yi = (x - xi) 

X2 2 1 


which is known as the two-point form of a straight line. Thus, given 
that (2, —1) and (6, 7) are two points on a line, its equation is 

y + 1 = ~ 2 ) or 2 * - v = 5. 

3. Quadratic Function. A quadratic function of a variable v 
is a polynomial of the second degree in v which may be expressed in 
the form Av 2 + 2 Bv + C where A, B, and C are fixed real numbers. 

1 When the line is vertical, a = 90° and m does not exist. Then Ax = 0 and 

division by zero is excluded in our algebra. 
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The minimum value of such a function is useful in statistics. We 
have 

Av* + 2 Bv + C = \ [AV + 2ABv + AC] 

A 

= ~ [(Av + By + (AC - /**)]. 

Since ( Av + B) 2 3 4 5 is positive or zero and (AC — B 2 ) does not involve 
the variable, we have the following: 

Theorem I. If A is positive the minimum value of Av 2 + 2Bv + C 
occurs when Av + B = 0\ the minimum 
value is (AC — B 2 )/A. 

The graph of the equation y = Av 2 + 

2 Bv + C, (A > 0), is a parabola which 
opens upward and whose vertex is where 
v = —B/A. Of course* the function has its 
minimum value at this vertex, viz.: (v 0 , ?/u) 
where v 0 = — B/A , y 0 = (AC — B 2 )/A . 

Exercises 

1. (Wilson and Tracy) The premium ($*/) on a $1000 life insurance policy for 
various ages (x years) is given in the following table. Draw a graph ex- 
hibiting y as a function of x. Estimate from the graph the premium at 
age 32 and at age 43; also the age at which the premium is $52. 



X 

20 

25 

30 

35 

40 

45 

50 

55 

60 

y 

18.78 

21.02 

23.86 

27.54 

32.36 

38.83 

47.68 

59.88 

76.04 


2. Find an equation of each of the lines through two points given as follows: 

(a) (2,6), (4,5); (6) (0,3), (1,6). 

3 . Find the equation of a line through the point (2, 3) and parallel to the line 

4tX + by * 7 . 

4. (a) Find the value of x for which f(x) = 2x 2 — Sx + 9 has a minimum 

value. (6) What is this minimum value? (c) Draw a graph of y • f(x) 
and show the meaning of your answers to (a) and (b). 

5. How would the theorem in §3 be affected if A < 0? 

,6. Prove that the second moment of x is a minimum when taken about the 
mean of x. 
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Hints. Solution 1. 

Let /(») -»)’ 

N 1 

1 N 

= V 2 - 2$v + — £x, 2 . 

N i 


By the theorem of §3, show that. f(v) is a minimum when v = 2. 
Solution 2. By definition, 

1 N 

m =■ v r £(** — ^) 2 
A l 

1 ^ 

^ - .. L fe - *0 2 , » ^ 5. 

A r l 


IS M2 < I*? 

Solution 3, for calculus students. From /(i/) as derived above, 

2 v 

/'(") = -TjEfe-*'). 

A r l 


Set f'(v) = 0 and solve for v. Since f”(v) >0, v = £ yields a minimum, 
not a maximum. 

JV N 

7. Show that the value of A; for which J{k) = + 2 k(m^x t — ^fji) -FCis 

1 l 

a minimum is defined by 


.V N 

ni£x t + Nk ^ £//». 

l l 


4 . Fitting a Straight Line. The preceding discussion is intended 
as a basis for the presentation of certain methods of fitting a line to 
data. The equation y mx + k represents a family or set of 
lines corresponding to different values of the arbitrary constants 
m and k. As noted previously, such constants are called parameters. 
The process of finding the best fitting line for any given data consists 
in determining m and k. By “ best fitting ” wc mean best under a 
criterion of approximation specified by a method. We will consider 
three such methods: (a) graphical , ((f) the method of moments of 
ordinates , (c) the method of least squares. 

5 . Graphically. A straight line is drawn (preferably with the aid 
of a transparent ruler) to fit as closely as possible the plotted points. 
To find the equation of this line, select two points on the line and esti- 
mate their coordinates fa, y\) and 0r 2 , 2 / 2 )- Substituting these coor- 
dinates in the lt two-point ” form of the line (3), we get the desired 
equation. 
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If the first point is chosen so 
that Xi = 0 the numerical work 
of simplifying the equation is 
somewhat lessened. 

Example 3. Fit a line graphically 
to the data in Kxample 2. 

We take the origin of x at. 1923, 
hence from the figure = 0, y v = 
67) and (x 2 = 5, y 2 = 100). 

By equation (3), 


Therefore, 

y = 6.6a: + 67 
is the required equation. 



X 

y 

(1923) 0 

66 .7 

1 

72.7 

2 

62.3 

3 

9 2.1 

4 

9 3.0 

(1928 ) 3 

100.6 


The graphical method is open to the objection that it depends 
upon the judgment of the investigator. Different people will lo- 
cate the line in different positions and therefore obtain different equa- 
tions. However, where only approximate results are needed it is 
usually quite satisfactory. 

6. Method of Moments. In equation (1 ) y is not only a function 
of x but it is also a function of the parameters m and k. This func- 
tional relationship may be expressed symbolically by the notation 
f(x , m , k). Given the functional form of a curve y = f(x, a, 6, 
c, • • the parameters a, b, c, • • • may be detei mined by obtaining 
expressions for as many moments of the computed or functional y’i s as 
there are parameters in the function and equating these to the numeri- 
cal moments of corresponding order of the observed or empirical y } s. 
A solution of the resulting equations, theoretically possible, gives 
the “ best ” values of the parameter. This is the method of moments 
of ordinates . For a set of N values of (x», yi) the rth moment of y is 
defined by the expression 

1 N 


where r is zero or a positive integer. 

In fitting a straight line by this method we obtain two equations 
involving m and k if we equate the zeroth and first moments of the 
observed y 9 s to the zero th and first moments, respectively, of the y f s 
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computed from the assumed equation y - mx + k. All moments 
are taken about the origin of x. These two equations may then be 
solved for m and k. The procedure will be made clear by the figure 
and explanation below. 



X 

0 y 

X 

c y 

*1 


*1 

mx, + k 

x 8 

y* 

x 2 

mx 2 + k 

• • 

• • 

• • 

■ • 

*1 


x i 

mxj + k 

• • 

• • 

• • 

• • 

*n 

Vn 

X n 

mx n + k 


Suppose we are given N pairs of values of x and y. Denote the 
given or observed y’s by c y and the computed y’a by c y. For the 


observed y’a, the first moment is — YjX,y„ and the zeroth moment is 

N 


^ By a- “ computed y ” corresponding to any value of x we 

mean the result obtained by substituting that value of x in the equa- 
tion y — mx + k, and solving for y. Thus, for any value of x, say 
xt, we obtain mxt + k for the corresponding computed y,. Graphi- 
cally, it is an ordinate of the line. Therefore, the first moment of 

the computed y’s is — yiz, (mxj + k), and the zeroth moment is 
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jj ^,x^(ntXi + k). Applying the principle of moments we have 

observed computed 
zeroth moment = 2(mx< + k) 

first moment Z x <Ui = £z>(mx< + k) 


where the summations run from 1 to N. 

To solve for m and k we write the preceding equations in the follow- 


ing form: 

(4) .{ 

By determinants, 

+ kN = J^j/i 

Tn£,x? + *][>< = 


r 

m = 

ft = 

Zv 
2 > 

N 

Z> 

(Zv)(Z x ) - Nj^xy 

(6) • 

Z* 

2> 

2> 

I> 2 

N 

2> 

Zv 

Z x y 

(Z*Y - *Zx 2 

( Zx)(Zxy ) - ZvZ 



D 

D 


The determinant D in the expression for k is the same as that in the 
denominator of the expression for m. [In order to solve equations 
(4) for the values (5) it is assumed that D does not vanish.] The 
terms in the expressions for m and k refer to the original data. When 
these expressions have been evaluated they replace m and k in the 
equation y = mx + k. 


Example 4. Find by the method of moments the best fitting line for the data 
in Example 2. 


X 

y 

*y 

x * 

0 

66.7 , 

0 

0 

1 

72.7 

72.7 

1 

2 

82.3 

164.6 

4 

3 

92.1 

276.3 

9 

4 

93.0 

372.0 

16 

5 

100.6 

503.0 

25 

15 

507.4 

1388.6 

55 


(507.4) (15) - 6(1388.6) . _ . 15(1388.6) - 55(507.4) ^ , 

6.86 k * ~ ** 0 /. 4 . 

(226) - 6(55) D 
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Therefore, 

y - 6.86* + 67.4. 

7. An Alternative Procedure. In practice, it is sometimes easier to 
remember the procedure of fitting a line by the method of moments if 
one obtains the equations in (4) directly from the data instead of using 
the formulas for m and k. This will involve the following three steps: 

(a) Substitute each of the given pairs of values in y = mx + k and 
add the corresponding members of the resulting “ equations.” This 
gives the first equation in (4). 

(i b ) Multiply each “ equation ” in (a) by the coefficient of m in 
that “ equation ” and add the corresponding members of the re- 
sulting “ equations.” This gives the second equation in (4). 

( c ) Solve the equations simultaneously. This will give the 
required values of m and k. 

The algebraic statements which we designated “ equations ” (de- 
noting that the statements arc only approximately true) are called 
observation equations in the theory of errors. A linear combination 
of a set of linear observation equations is a true equation. 

Exam-pie. Verify, for the data in Example 2, that the above procedure gives 
the same values of m and k as the formulas. 


Step (a) 

66.7 = 0 m + k 

72.7 = Xtn + k 
82.3 =2 m + k 

92.1 =3 m 4 k 

93.1 *= 4 m + k 
100.6 = 5m -f k 

507.4 = 15m. + 9k 


Step (b) 

72.7 « m + k 
164.6 = 4 m + 2k 
276.3 = 9m + '3k 

372.0 = 16m + 4k 

503.0 = 25 m + 5k 
1388.6 = 55m + 15fc 


Step (c) 


Solving th<* equations, we obtain m = 6.86, k = 67.4, as before. 


8. Least Squares. Case /. A standard method of fitting a curve 
to empirical data is one known as the method of least squares. As- 
sume, -as before, that the plotted 
data suggest the linear relationship 
y = mx + k. Let d represent the 
difference between the ordinate of 
any given point and the correspond- 
ing ordinate of the line, that is, 
= [y% — ( mxi + k)]. These dif- 
ferences are called residuals . The 
method of least squares is based upon the following principle. 
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Principle op Least Squares. The “ best " estimate of a param- 
eter is that for which the sum of weighted squares of the residuals is a 
minimum. 1 

The sum is to be taken over all the observations that are subject 
to error. We shall assume that the observations are all of equal 
weight; consequently we may let each of the weights be unity. Then 
the parameters m and k are estimated by imposing the condition that 

N 

di 2 be a minimum. Now 

d 2 = ~ ( mx + W 

(6) = Nk 2 + 2mkJ^x + m 2 ]Tx 2 - 2 k^V - 2 mj^xy + J^y 2 . 

This is a quadratic polynomial in k. We may write it in the form 

(6a) f(k) = Nk 2 + 2 k(mZ* ~ + <' 

where C represents the terms not involving k. Then according to 
Theorem I the minimum, value of f{k) occurs when 

£ y ZL mZx 

N 

that is, when 

Nk + m^x - = 0. 

The right member of (6) is also a quadratic polynomial in m. We 
must choose m so that 

+ kYjX - ^xy = 0 . 

These last two equations 2 are the same as (4). When obtained by 
the method of least squares they are called normal equations. There- 
fore the values of m and k in (5) determine the best fitting line by 
both the method of moments and of least squares. It can be shown 
that the two methods give the same result for any polynomial. 3 

It is interesting to observe that the sum of the residuals is zero. 
Thus it can easily be shown that £[z/ — {mx + k)] = 0, when the 

1 For further information about this principle and a discussion of weights, the 
following books are recommended: (a) Reference 4. (b) Statistical Mathematics 
— A. C. Aitken. Oliver and Boyd. 

8 The student of calculus would obtain these equations as follows. Let * 
f { m t k ) = Y 1 (V — mx — k) 2 . Then differentiate J(m, k) partially with respect 
to m and fc, respectively, and equate the results to zero. 

3 See American Mathematical Monthly , September, 1923. 
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values given in (5) are substituted for m and k. This property and 
the fact that the sum of the squares of the residuals is a minimum are 
quite analogous to two similar properties of the arithmetid mean, viz., 

(1) The sum of deviations from the mean is zero. 

(2) The sum of the squares of deviations from the mean is less than 
the sum of the squares of such deviations taken from any other value, 
i.e., M 2 < 


Case II. In Case I distances between the points and the line were 
taken parallel to the y-axis. But we may just as logically, from a 

formal point of view, take dis- 
tances parallel to the x-axis, and 1 
make the x residuals the basis for 
a least squares criterion of best 
fit. Similarly, for the method of 
moments: we can set up two 
equations such that the first mo- 
ment of the observed z ’ s equals 
the first moment of the computed 
x’s, and the zero th moment of the observed x’s equals the zero th mo- 
ment of the computed. To do this let x = m 2 y + b represent the 
equation of the line. Then by the principle of moments we have 



53 * = + b ) 

H x y = + &)• 


Solving for m 2 and b we obtain 


L 5 >2> - N H x y 

d 


( 7 ) 


b = 


- 5>53y* 

D 


D = (2>) 2 - NZy*. 


If we determined m t and b by making the sum of the squares of 
the x residuals a minimum we would get the results given in (7). 
The expressions in (7) are those of (5) with x and y interchanged. 

In general, Cases I and II will give different lines. Case I assumes 
that the observed points fail to fall on the line because of errors 
in the ordinates only. Case II assumes that only the z-coordinates 
are in error. In the application of curve fitting to economic data, 
etc., the formal mathematical procedure should not be used without 
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first verifying that the underlying assumptions involved in the pro- 
cedure are justified. Inasmuch as the independent variable x can 
be controlled in experimental and 
observational data, the errors 
usually exist only in the y’s. 

Therefore, in speaking of the best 
line by the method of moments 
or least squares it is conventional 
to mean the line which fits best in 
the sense of (5) rather than (7). 

Case III ( for calculus students ). 

A third line can be obtained 
which fits best in the sense that 
the sum of the squares of the per- 
pendicular distances from the points to the line is a minimum. 

Let us suppose the equation of this line to be in the form 

y ' = mx' + k 

where x' = x — x, y' = y — y, and (x, y) is the mean of the ob- 
served data. The distance from this line to a point (x/, y/) rep- 
resenting a pair of observed values (referred to their respective 
means as origin) is, from analytics, 

y- - mx/ - k 

di = 7 =■ 

v m~ + 1 

we are to choose 

m and k so that the function 

,(m - k) - ,-^TT I .V - *>’} 

is a minimum. This function may be written in the form 

/(w, k) = ( ay 2 + A* 2 + m 2 <r x 2 - 2mr<T v <r x ) 

m 1 + 1 

where r is a convenient symbol defined by the relation 

1 N 

T(Fy<Tx = T7 Vi • 
i\ i 


1 


N 


We wish to make ~ T jli 2 a minimum. Therefore 
N i 
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To make /(m, k) a minimum we first put A* 2 = 0. Then we equate 
to zero the first derivative with respect to m and obtain 

m 2 ra v a x — m(<r v 2 — <r x 2 ) — r<r y <r x = 0. 

Solving for m we have 

(ff y 2 ~ cr* 2 ) =t [(<7y 2 - <r x 2 ) 2 + 4r 2 oryV x 2 ] l/2 
m 

2 r<jy<r x 

Therefore the required equation is y' = mx'. Referred to the origin 
of x and y y this is 

y - y = mix - x) 
where m is determined above. 

This line is the appropriate one to fit if there are errors in both x 
and y of the empirical data. 

A special problem under Case /. Sometimes problems arise where 
the line to be fitted is restricted in some way. For example, the 
nature of the problem may require that the line shall pass through 
the origin. If this condition is imposed, (1) takes the form 


y = mx. 


The least squares estimate of the slope of this line depends upon 
various assumptions about the errors. If y is subject to error and 
x is free of error, and if the observations are all of equal weight, it is 
easy to show that 


_z_ 


by the principle of least squares. This principle will give different 
estimates of m under different assumptions about the weights of 
the observations. Several particular solutions of the more general 
problem and some applications will be found in §15 of reference 4 
on page 6. (See also our Exercise 11, p. 189.) 


Exercised 

1. Fit a line to the following data by Case I : 
Ans. y = — .5x + 8. 


X 

6 

7 

7 

8 

8 

8 

9 

9 

10 

V 

5 

5 

4 

5 

4 

3 

4 

3 

3 
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‘ 2. Show that Y'd — 0 for Exercise 1. 

3. Using the values given in (5) for m and k show that £[y — (mi + jfc)] = o. 

4. Verify the expressions for m 2 and b given in (7). How would you modify 

the u alternate procedure ” so it will apply to rn 2 and b? 

5. Fit a line to the data of Example 2 by the method of Case II. 

8, Show that the formulas in (5) fail when the x's are all equal. Hint Re- 
place x by a constant c in the; denominator I). 

9. Simplification. The formulas for m and k may be simplified. 
For certain purposes it may be desirable to make the transformations 
x' = x — x and y' = y — y. This has the effect, graphically, of 



translating the origin to the point ( x , y) so that the y-axis is moved 
to the value x , and the x-axis is moved to the value y. Let the equa- 
tion of the line with reference to these new axes be y f = mix' + k\. 
The formulas for m i and A*i will be the same as for m and k except 
that x will be replaced by x' and y by y\ Hence 

_ nZx y - 2 >xy 
mi N'T*' 2 ~ (Ex') 2 

_ E* n Ey' - E*'E*'y’ 

1 n^x'* - ('Zx'r 

But since x' is a deviation from the mean of x, E x> — 0. Similarly, 
Ev' = o. Hence the values of mi and k\ reduce to 

T.x'y' 

(8) Ai = 0 - 

Therefore the line goes through the new origin, and its equation is 
( 9 ) y ' = rtiiX 1 

where mi is defined in (8). 
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The above transformation may not lighten the computations un- 
less the values of x or y are equispaced. However, it does simplify 
the theory in certain applications, particularly in correlation theory 
(Chapter VIII). 

10. Time Series. If one of the variables is time, as in Examples 1 
and 2, the data are called a time series. The best fitting line is then 
commonly called a trend line or trend. In the process of fitting a 
trend line, a first simplification, obviously, is to take the origin at one 
of the given dates as we did in Example 3. But a much greater 
simplification is possible, if the x’n are equispaced, as they usually 
are in a time series. Denote the common differences of the x’s by c 
and the mid-date by x. Then we may shift the origin to x and change 
the unit of measurement along the horizontal axis to c. Thus we may 
let 


( 10 ) 

where 

(ID 


t = 


x — x 


c 


X = 


Xi + Xff 
2 


if the z’s are equispaced. 

Let us think now of our line in ( t, y ) coordinates, and let its equa- 
tion be y = at + 6. Our problem is to find a and b numerically from 
the given data, as we found m and k before. Our normal equations 
will be 

2 j / = H( ai + b ) 

Hty = + b)t. 


Since = - ]5£(a: — x) = 0, and = Nb, the above equations 
c 

are readily solved, giving 


( 12 ) 





The student should remember that this simplification can be used 
only when the jc’s are equispaced. 

Example 5. Find the trend line for the following data. Here c = 5, and from 
(11) S - 10. 



Sec. 10 


Time Series 


151 


X 

U 

t 

ty 


0 

12 

-2 

-24 

4 

5 

15 

-1 

-15 

1 

10 

17 

0 

0 

0 

15 

22 

1 

22 

1 

20 

24 

2 

48 

4 

Sums 

i 90 


31 

10 


From (12), 



3.1, 



- 18. 


So the required equation is y = 3.1/ -f- 18, with reference to the new origin and 
units. If we wish it in terras of x, we substitute 

x - 10 
i = -— 

and obtain y — .62x + 11.8. 

Example 6. Same as Example 5, with another observation added. Note that 
when there is an even number of observations, the values of t are fractional. 
In this case it is convenient to use the column headings 2 ty instead of ty , and 4 1 2 
instead of t 2 . 


* 

y 

t 

2 ty 

4/ 2 

0 

12 

-5/2 

-00 

25 

5 

15 

-3/2 

-45 

9 

10 

17 

-1/2 

-17 

1 

15 

22 

1/2 

22 

1 

20 

24 

3/2 

72 

9 

25 

30 

5/2 

150 

25 

Sums 

120 


122 

70 


2 = 12.5, Y. ( y = 61. = 17-S 

o = 3.49, 6 = 20 

y = 3.491 + 20 

y = 3 . 49 ( Z ~ 5 1 ~) + 2 ° 

y = .7* + U-28. 
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11. Exponential Trends. When the given y values form a geo- 
metric progression while the corresponding x values form an arith- 
metic progression, the relationship between the variable's is given 
by an exponential function, and the best fitting curve is said to 
describe an exponential trend. Data from the fields of biology, 
banking, and economics frequently exhibit such a trend. Thus the 
growth of bacteria is exponential. Money accumulating at com- 
pound interest follows the same kind of law of growth. And in busi- 
ness, sales or earnings may grow exponentially over a short period. 
Another familiar example is the increase in friction as a rope is 
coiled around a post. As the number of coils increases in arith- 
metic progression, the friction increases in geometric progression . 1 
This explains why a few turns of the hawsers around the bitts at the 
wharf is sufficient to hold a large ship. 

The characteristic property of this law is that the rate of growth, 
that is, the rate of change of y with respect to x, at any value of x is 
proportional to the value of the function for that value of x. The 
function 

(13) y = AeP x 

has this property . 2 The letter e is a fixed constant, whereas A and 
B are parameters to be determined from the data. If y decreases 
as x increases, B is negative. An interesting example of this case is 
the disappearance of radioactive substances like radium. 



Fia. 28 — General Appearance of the Graph of (13) for x ^ 0 and A > 0. 

To assume that the apparent law of growth will continue is usually 
unwarranted, so only short range predictions can be made with any 
considerable degree of reliability. When the exponential character 

1 Elementary Mathematical Analysis — C. S. Slichter. McGraw-Hill. 

1 The student of calculus will understand that “ rate of change ” is used here in 
the derivative sense. Fob (13), dy/dx = ky. 
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of the observed phenomenon ceases a saturation point is said to be 
reached. 

The parameters A and 5. If we transform (13) so that it is linear 
with respect to its parameters we may use the methods for fitting 
a straight line to determine A and B. To this end we first take the 
logarithms (to base 10) of both sides of (13), obtaining 

(14) log y = log A + (B log e)x 

which is of the form 

(16) Y = k + mx 

where Y = log y y k = log A, m = B log e. 

KVe look up the logarithms of the given y’s and denote them by Y t 
we may fit the equation Y = mx + k to the ( x> Y) values by deter- 
mining m and k by means of the formulas given in (5). In using 
these formulas we must remember to replace y by Y. After m and k 
are determined, A and B may be obtained from the relations 

A = anti-log of k 

B = m/ log e, where log e = log 2.718 

= .4343. 

The student may be interested to verify that the relation Y = mx + k 
can be put back into the form (13). We may write (14) in the form 

y = io 10 * A + (* Iog e)x 
= {10 Iok ^} {lO 10 **}** 

= Ae Bx , 


The last step follows because 10 logl0jV = N by definition of logarithm. 

Example 7. Find the exponential trend for the following data, and draw the 
curve. 


X 

y 

Y 

xY 

X 4 

1 

1.6 

.2041 

.2041 

1 

2 

4.5 

.6532 

1.3064 

4 

3 

13.8 

1.1399 

3.4197 

9 

4 

40.2 

1.6042 

6.4168 

16 

5 

*125.0 

2.0969 

10.4845 

25 

15 


5.6983 

21.8315 

55 
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From (5) we have, 

D = (£>)« - 

m = ^iZYllx - NZxY] 

* = £ IZxZxY - ZYZx*]. 

Therefore, 

D = [(15) 2 - 5(55)] = -50 
m = ~ [(5.6983) (15) - 5(21.8315)] = .4737 

k = — [15(21.8315) - (5.6983) (55)] 

.2813 = 9.7187 - 10. 


And 


log A = 9.7187 - 10, hence A = .5232 


B = 


m 

.4343 


1.091. 


Therefore the required equation is 


y = .5232e 1 - 0911 . 


When the x'a are equispaced, as here, the work may be simplified by using (10) 
and fitting a lint 1 


Y = at + b. 


The problem now is essentially the same 1 as in §10 where a and b are defined in 
(12) except that we are now dealing with (/, Y) coordinates instead of (t, y). 

The method is illustrated below. 


t 

Y 

tY 

** 

-2 

.2041 

-.4082 

4 

-1 

.6532 

-.6532 

1 

0 

1.1399 

0.0000 

0 

1 

1.6042 

1.6042 

1 

2 

2.0969 

* 4.1938 

4 

t » x — 3 

5.6983 

4.7366 

10 


1 The critical reader will realize that fitting a straight line to the values of log y 
is not quite the same as fitting an exponential to the values of y. However, the 
discrepancy usually does not affect the fit seriously. For a method which is free 
from this difficulty, see Glover’s Tables , p. 468. 
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From (12) 

Dr 4.7366 
° - I> - 10 - tm 
. 1 .... 5.6983 

*-*£»■- 5 - U397 - 


So 



Y = 4737 t + 1.1397. 



Transforming this into (x, Y) coordinates we have 

Y = .4737 (x - 3) + 1.1397 
= .4737s - .2814 

as before. 

For purposes of plotting, predicting, or interpolating, values of y in (13) may 
be obtained by means of the intermediate form (15). So, to sketch the curve 



2 3 

Fig 29 


for this example, we first assign values to x in the last equation, compute the 
corresponding values of Y } and then obtain the values of y from a table of loga- 
rithms. These values are given in the following table. The curve in Figure 29 
is sketched from the (x, y) values in this table. 


X 

l 

2 

3 

4 

5 

6 

Y 

0.1923 

0.6660 

1.1397 

1.6134 

2.0871 

2.5608 

y 

1.56 

4.63 

13.79 

41.06 

122.2 

363.8 
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12. Further Remarks on the Exponential Function. Equation ( 13 ) 
is sometimes called the compound interest law because it describes 
the way money would grow if interest were compounded continu- 
ously. If P dollars are invested at a nominal rate j% compounded 
m times a year, the amount S after x years is given by the formula 


S = P 



If j is compounded continuously or, in other words, if m is taken 
indefinitely large (written m — > qo ), the amount S docs not increase 
indefinitely but approaches a limiting value. We may write the 
expression for S in the form 


S = 



If we let N = m/j, we have 


S = 



It can be shown in the calculus 1 that, as N <x>, the quantity 
f l\ N 

f 1 + “1 approaches the limit called e. Thus we have 

lim (\ + = e = 2.718 

*->«> \ N/ 


This limit is also the base of the Napierian, or natural, system of 
logarithms. As m — > °c so does N — ► . Therefore in the ideal case 

of continuous conversion of interest, we have the limiting form 


s= lim pr(i+-Y'T 

m „ >Q3 [A m / J 

v) 1 " 

that is 

S = Pe jx 

which is of the form (13). 

There are several other forms of the exponential function. For 
example, if we let r = e B , (13) becomes 


y — Ar x 


1 The teacher can give appropriate references. 
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which is the general term of a geometric progression whose first term 
is A and common ratio is r. 

If B is negative in r = e B then r < 1 . So (13) is a decreasing func- 
tion when B is negative. 

If we let 10* = e B f (13) becomes 

y = A 10**. 

Then k = B Iogi 0 e and k differs from B by the factor logio e. This 
factor is known as the modulus of the system of logarithms of base 10 
with respect to the system of base e. 

The value of the reciprocal of the modulus 

— = 2.3025851 • • • 
logio e 


is often useful. For example, suppose that the logarithm to base e 
is required for a given number N and tables to base 10 only are 
available. Let log c N = x. Then e x = N f and x logio e = logio N, 
whence x = log™ iV/logm e = 2.303 log u , N. (Hereafter, the base 10 
will be understood unless otherwise indicated.) 

13. Ratio Charts. In the graphical representation of data that 
exhibit an exponential trend, it is often desirable to use semi-logarith- 
mic paper. Such paper has a logarithmic scale in the vertical direc- 
tion and a uniform scale in the horizontal direction. (Figure 30.) A 
logarithmic scale is one in which the distance from y = 1 to y = N 
equals lqg AT, A “ cycle ” of rulings spaced according to the loga- 
rithms of the integers from 1 to 10 is the unit of the vertical log y 
scale. 

“ Semi-log ” paper may be constructed or purchased having one 
or more cycles. The appropriate number of cycles is determined 
by the range of y values in the data to be plotted. If the bottom line 
of the first cycle is labeled 1 and taken as the origin of log y 
(log 1 = 0), the beginning of the next cycle is read 10 (log 10 = 1), the 
next one above that is read 100 (log 100 = 2), etc. However, the 
beginning of the first cycle may be labeled with any number which 
is an integral power (positive or negative) of 10, as .01, .1, 10, 100, etc. 
Corresponding Knes in successive cycles are labeled with numbers 
which are 10 times those in the preceding cycle. Since y has no real 
logarithm if y ^ 0, neither zero nor negative numbers are found on 
a logarithmic scale. Plotting a point whose semi-logarithmic co- 
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ordinates are (x, y) is equivalent to plotting the point whose rectangu- 
lar codrdinates are (x, log y). 

0 

Example 8. Plot y =8 (2*) on semi-log paper. 

Solution. Assigning values to x we form the following table, 


J 

—3 

-2 

B 

0 

B 

2 

3 

4 

y 

1 

2 

B 

8 

16 

32 

. 

64 

128 


from which wc obtain the semi-logarithmic graph shown in Figure 30. 

We now state the following theorem. 

Theorem II. If A is a positive constant , the (x, log y)-graph of 
y = Ae Dx is a straight line. 



Proof: Since (15) is linear in x and F, its graph in (x, Y) rectangu- 
lar coordinates is a straight line. 

Semi-logarithmic graphs are also calted ratio charts . Their useful- 
ness depends upon the property of logarithms that 

log ^ = log M - log N. 

It follows that the distance between any two ordinates of the chart 
measures the ratio between the values represented by these ordinates. 
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Thus if 

then 

or 


Vi^Vb 

2/2 2/4 

log 2/1 - log 2/2 = log 2/3 - log 2/4 

Ki - F 2 = Vs - r 4 , 


that is, equal ratios are represented by equal vertical distances. 
Likewise, if 


then 


2/i 2/3 

2/2 2/4 

^ - r 2 > r 3 - r 4 


and the larger ratio is represented graphically by the larger distance. 
These differences of elevation are independent of any base line. 
The same percentage increase in y is represented by the same addition 
to the height of Y in all parts of the chart. Hence, it is easier to 
depict and discover percentage changes on ratio charts than on 
ordinary charts. 

The analysis of time series in economic statistics is often facilitated 
by forming 11 link relatives ” which are ratios of each ordinate (after 
the first) to the preceding ordinate. Thus, if y u y 2 > • • •, y n are the 
given values, the link relatives are 


ill = ~ * 9 /l2 = — ; 

2 /i 2/2 


y n 

— — — • 
y n -i 


Any link relative R denotes the percentage change in y from one 
month (say) to the next. If the y’s are plotted on ratio paper they 
will lie on a straight line when the R ’ s are equal, on a curve bending 
upward when the R’s are increasing, and on a curve bending down- 
ward when the R’s are decreasing. It follows that if two curves are 
parallel on ratio paper their rate of increase (or decrease) is the same. 

For further discussion of ratio charts the student is referred to the 
books of Bivins and Haskell (see §7, Introduction). 

Graphical determination of exponential function . It follows from 
Theorem II that data giving a straight line when plotted on semi- 
logarithmic paper (with x on the uniform and y on the logarithmic 
scale) satisfy an equation of the form (13). Suppose that the 

* 
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(straight line) graph has been drawn and one desires the exponential 
function which the line represents and the data satisfy. The con- 
stants A and B in (13) can be approximated by the following method. 1 
We first observe that the slope of the line represented by (15) is 
given by 

r.-ri 

m = B log e = 

Xi — *1 


To determine the numerical value of B, take one cycle of y (over 
which the graph extends) from any starting point and read the cor- 
responding values of x (Figure 31a), so that 

B _ Yi Yi _ log ( 2 / 2 /z/i) = log 10 _ 2.303 
fa — Xj ) log e Ax log e Ax log e Ax 


10 
7.7 

1 x l x 2 

(b) 

Fig. 31 



(a) 


In case the graph does not extend over one cycle, determine x for 
y =» e and y = 1; then (Figure 316) 

B = logc = JL, 

Ax log e Ax 

The sign of B is of course positive if the graph has a positive slope 
in the ordinary sense and is negative for a negative slope. 

If the graph intersects the line x = 0, the value of A can be read 
off at this intersection. If, in the data-in volved, the graph does not 
intersect the line x = 0, A can usually be determined by finding y 
for some convenient values of x such as Bx = some integer n, where- 
upon A = y/e n from equation (13). 

In practical problems, the plotted points representing the data 

1 Note on Semi-Logarithmic Graphs — W. T. Lcnser, The American Mathe- 
matical Monthly, vol. 49 (1942), pp. 611-613. 
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will not usually fall exactly on a straight line. But if they exhibit 
a linear trend one may draw (with the aid of a transparent ruler) 
the line that seems to fit them best. Then proceed as above. 

Example 9. Tho uniform scale along the horizontal axis of a sheet of semi- 
logarithmic paper ranges from 0 to 10; along the vertical axis the logarithmic 
scale ranges from 100 to 1000. A straight line is drawn on tint paj>er from the 
upper endpoint of the vertical scale to the midpoint, of the horizontal scale. 
Determine (i) the equation of the ex]x>nential function represented by the line, 
(ii) the equation of the? line in (x, Y) coordinates. 

Solution 1, using above method. A = 1000. li = —1/(5 log e) = (— 2.3)/5 
= —0.46. Hence, the desired equation (i) is 

y = 1000e-°’«*. 

The slope of the line is m = B log e = — J and its equation (ii) is 

Y = 3 - 0.2z. 

Solution 2. The line goes through the |K>int,s (0, 1000) and (5, 100). Substi- 
tution of the first pair of coordinates into (13) gives A = 1000. Substitution 
of the second pair into y — 1000c flx gives 100 = 1000c 5fl . Then e~ hB = 10 and 
— 5B = log* 10 = 2.303, whence B = —0.46. 

14. Logarithmic Coordinate Paper. A function of the form 

(16) y = kx m 

is called a power function. If k > 0 we have 

(17) Y = K + mX 

where the capital letters denote the logarithms of the corresponding 
lower-case letters. Form (17) suggests the usefulness of logarithmic 
coordinate paper on which the rulings in both directions are at dis- 
tances from the origin that are proportional to the logarithms of 
the numbers represented. To mark on this paper a point whose 
ordinary coordinates are (X it Y 0 we plot the point whose rulings 
correspond to the numbers and y\. 

It is evident from (17) that the graph of (16) is a straight line on 
logarithmic coordinate paper. It also follows from (17) that the 
problem of fitting a curve of the form (16) to a set of observations 
can be reduced to the problem of fitting a straight line. 

Example 10. A straight line is drawn on logarithmic coordinate paper through 
the points (4, 16) and (6, 54). Determine the function y = f(x) which has that 
line os its graph. 



vn 


162 Curve Fitting 

Solution 1. Substitution of the coordinates of the given points into (16) gives 

/ 16 = km 
\54 = km. 

Upon dividing each member of the first equation by the corresponding member 
of the second, we obtain 8/27 = (2/3 ) m whence by inspection m = 3. Then 
k - l, anti the required function is 4 y = x 3 . 

Solution 2. Substitution of tiie logarithms of the given coordinates into (17) 
gives 

( 1.20412 = K + 0.60206m 
\ 1.73239 = K + 0.77815m. 

Solving, in = 3 and K = —.60206 = 9.39794 — 10, k = .25. 


16. Parabolic Trend. Data of broad economic or social signifi- 
cShce extending over a long period of years may often be described 
by an arc of a second degree parabola. The equation of a parabola 
is of the form 

y = a + 0x + yx 2 

where a, 0, y are the parameters to be determined. 

If the x’s are equispaced we ma} r let 

x — x 
t = 1 


where x = (x\ + x N )/2 and c = | Xi+\ — Xi |, and thereby effect 
considerable simplification in evaluating the constants. In t and y 
coordinates the equation will, of course, involve different constants 
and we may write its equation in the form 

(18) y = A + Bt + Ct 2 . 

The method of moments may again be used and since (18) is a poly- 
nomial this method also gives the best fitting curve in a least squares 
sense. Because there are three constants to be determined we must 
equate the second moments as well as the zeroth and first moments. 
Imposing these conditions of moments between the observed and 
computed ordinates, we obtain the three normal equations: 

= na + sD + cD 2 

T'.lv — A y^.t + B'y'.t 2 -f- 

2>j/ = AZt* + bZp + cl> 
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Since the mean is chosen as origin £2 = 0. With this choice of 
origin and because the x’s are equispaced it can be shown that 
I> = 0. Therefore the normal equations simplify into 

B = 

(19) AN+CZP = Zy 

A'Zt 1 + C'Zt* = 

When the summations involved in these equations are evaluated 
from the data the values of A, B , and C can easily be determined. 

Example 11. Fit a parabola to tlu* following (Jala. 


Number op Divorces per 1000 Marriages in the United States 

1900-1930 


Year 

y 

X 

t 

ty 

<* 

t'y 

* 

1900 

81 

0 

—3 

-243 

9 

729 

81 

1905 

84 

5 

-2 

-168 

4 

336 

16 

1910 

88 

10 

-1 

- 88 

1 

88 

1 

1915 

104 

15 

0 

0 

0 

0 

0 

1920 

134 

20 

1 

134 

1 

134 

1 

1925 

148 

25 

2 

296 

4 

592 

16 

1930 

170 

30 

3 

510 

9 

1530 

81 

Sums 

809 

£ = 15 


441 

28 

3409 

190 


From (19), 


B - 44J 

‘28 


721 + 28 C = 8U9 
2821 + 196C = 3409. 

Solving the last two equations simultaneously we obtain, 


Therefore, 


, 322 „ 173 

Jm T’ c "iT 


322 . 441 , 173 . 

y = — + — < + — <*. 


28 


84 


If we desire the equation in the original form we substitute t = — 15) and 

obtain 

y _ ™ + “1(LIL*) + iZ3/£^i5V 
y 3 28 \ 5 / 84 \ 5 / 
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which simplifies into 

y - 78.62 + .08* 4- .0824a;*. 

Upon the hypothesis that divorces will continue to increase according to this 
trend, we may estimate the number for 1950 for example. When x - 50 in 
the above equation, we find y = 318.62. 

16. The Gompertz Curve. The curve which bears his name was 
suggested in 1825 by Gompertz for use in actuarial science. Recently 
it has had some application as a growth curve in business and popula- 
lation forecasting and in certain problems in education. Its equa- 
tion 1 is 

(20) y = kg' x . 

To determine the parameters, we first transform (20) into the loga- 
rithmic form 

(20a) Y = K + Gc* 

where Y = log y, K = log k y G = log g. The number, N, of obser- 
vations available must be such that N = 3 n where n is the number in 
each of three subgroups with no observations omitted; that is, N 
must be divided into three blocks of data consisting of n items each. 
It is also neccssarj' that the values of the independent variable x be 
equispaced. Then the origin can be chosen so that x takes the 
values 0, 1, 2, • • •, 3n — 1. If these values of x are substituted in 
(20a) we obtain the three sets of functional K’s shown in (a), (5), 
and (c). 


0 

Y o 


Y 0 = K + Gc? ' 


1 

Y i 

n — 1 

£ Y t 

i-0 

Y x = K + Gc 
1 

" (a) 

n — 1 

y— , 


F„-i = K + Gc«-i , 


n 

Y n ) 


Y„ - K + Gc n ' 


n + 1 

Y n+ , 

• • •! 

2n — 1 

► £ Y • 

Y n+1 = K + Gc n+1 

- (6) 

2 n ~ 1 

Y tn - 1] 


Yu-i = K + Gc in ~ x \ 


2 n 

Y in ' 


Yin = K + Gc u ' 


2 n -f 1 

K,„ +1 

‘ ' ‘I 

3n —1 

z y< 

Y in+l = K + Gc 2n + l 
1 

> (e) 

Zn — 1 

Y 3n - ij 

| » ®2n 

Y u- 1 - K + Gc*-‘ J 

1 


1 For a derivation see Mathematical Theory of Life Insurance — Forsyth. 
John Wiley and Sons, Inc. 
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Let Si, St, S 3 denote respectively the totals of the subgroups (a), 
(b), and (c) . Thus we have 

Si = nK + G(1 + c + • • • + c»->) 

St = nK + Gc n (l + c + • • • +c n ~‘) 

Si = nK 4- frc 2 "(l + c + • • • + c" -1 )- 

Then 

5 2 - Si = G(c" - 1)(1 + c+ ■■■ + c»-') 

5 3 - S 2 = Gc”(c n - l)(t + c + • • • + c»"‘) 

whence we obtain 

51 - St 

c n = • 

5 2 — Si 


Writing the expression for S 2 — Si in the form 

(c n - 1)* 


S 2 — Si ~G' 


c — 1 


and solving for <7, we obtain 


a 


(S t - Si)(c - 1) 

The expression for Si may be written 

G( 1 - c") 


& = ntf + 


1 — c 


so we have 


n l 1 — c J 


In the above expressions, Si, St, S 3 denote sums of the functional 
F’s. If these are now replaced by the empirical data so that 

r -l 2n — 1 3n — 1 

Si = £ f„ & - £ f„ s 3 = £ Yt, 

2n 

where Y < refers to the observed F’s, then c can be determined 
from the expression for c n . Using the value of c, G can be deter- 
mined, and then K. 

If c < 1, it is clear from (20a) that F — >K as x— * <». Then 
y = k is an asymptote and k is sometimes called the ceiling of the 
curve. (See Figure 32.) 

For an application of the above method to a problem in business, 
see Statistical Methods ( Revised Edition) by Mills, page 672. 
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17. Remarks and References. The methods of least squares and 
moments do not select the appropriate curve. They merely deter- 
mine the “ best ” values of the parameters in the equation of the 
curve which has Been selected previously to describe the observed 
data. The question of the type of curve which should be fitted to the 
data is not always easy to answer. The selection of the appropriate 
mathematical function depends to a large extent upon the investiga- 
tor’s experience in the field in which the problem lies and his knowl- 
edge of the properties of curves. It always helps to plot the data first. 
The usual requirements for practical purposes are that (a) the curve 
must represent well the trend of the empirical data, and ( b ) the 
mathematical expression must not involve too many parameters and 
those present must be calculable from the data. In dealing with 
time series, if the objective is to find out what would happen if the 
percentage change should continue as it has on the average in the past, 
then an exponential trend is indicated. If the objective is to find out 
what would happen if the yearly (or monthly, etc.) change should 
continue as it has in the past, a straight line trend is indicated. 

We will merely mention here two other important curves which 
require more advanced mathematics in their treatment. The logistic, 
or so-called Reed-Pearl curve, is used extensively in studying various 
growth phenomena. Its function is of the form 

^ a + bc x 

and it resembles somewhat the Gompertz curve discussed above. 
For further discussion of this curve and methods of fitting it see 

1. Elements of Statistics — Davis and Nelson. 

2. Statistical Methods , Revised — Mills. 
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The function y = ks x g* 

is known as Makeham’s law. It is used in actuarial work. The stu- 
dent having a working knowledge of the calculus will find an inter- 
esting discussion of its use in the field of insurance in an article en- 
titled Makeham's Laws of Mortality , Rietz, American Mathematical 
Monthly, vol. 28, p. 471. 

The logistic curve was used in studies on the rate of growth of the 
population of the United States. But its usefulness in this connec- 
tion fell somewhat short, apparently, 1 of the claims of its sponsors. 
Two other references relating to the population of our country may 
appropriately be mentioned here. Although they do not involve 
problems of curve fitting they do afford instructive examples of the 
application of scientific method to social and political problems. 
They are 

1. Bibliography on Methods of Apportionment in Congress — E. V. Huntington. 
American Mathematical Monthly , vol. 49 (1942), pp. 115-117. 

2. Determination of the Center of Population in the United Stales. School 
Science and Mathematics, May and June, 1942. 

Exercises 

1. If the rate of change of y with respect to x is always proportional to the 

attained value of y then y is what kind of a function of x ? 

2 . Determine A and B in the best fitting curve of the type (13) for the following 

data. 


Data 

Form for Computations 

pm 

* 

t 


IY 

t 2 

0 


■ I 




5 


J|9|H 




10 


' V-lf 




15 






20 


| 





3. (a) Prove formula (11). 

( b ) Graph the curve y = 10<r**. 

4 . Find the best fitting parabola for the following points: ( —4, 2), (0, 8), (4, 9), 

(8, 11), (12, 8), (16, 5). Ans. y = 7.2 + .94x - .07x 2 . 

1 Differential Equations Subject to Error , and Population Estimates — Harold 

Hotelling. Jour. Amer. Stat. Assoc., vol. 22 (1927), pp. 283-314. 
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5. If the values of t form an arithmetic progression and 0 prove that 

= 0 . 

6. (a) Add the values x = 30, y = 37 to the data of Example frand find the 

trend line. Am. y = .Sx + 10.43. 

(6) On the hypothesis that the apparent trend continues, predict the value 
of y when x = 35 

7 . In a tensile test of a metal bar the following observations were made, where 

x represents the load in tons and y the elongation in ten-thousandths of 
an inch: 


X 

1 

2 

3 

4 

5 

V 

14 

27 

40 

55 

68 


Determine a linear relation l>etweon x and y by the theoiy of least squares. 

8. In the following table y represents the fire losses in the United States in 
millions of dollars. Taking the origin of x at 1915 find the best fitting 
line, in a least squares sense, for the data. 


X 

1915 

1917 

1919 

1921 

1923 

1925 

y 

172 

290 

321 

495 

535 

570 


9. (a) Add the values x = 6, y = 300 to the data of Example 7 (p. 153) and find 
the equation of the best fitting exponential curve. 

Am. Y = .461 7x - .2534 
y = .56c 1 - 061 . 

(6) Plot the given data and the curve obtained in (a) on semi-log paper. 

10 . Distinguish between the forms of the curves represented by the functions 

y = Ac~ Bx and y = Kc" 1 ** where A, B, K, and h arc positive real num- 
bers. If these functions were; plotted on semi-log paper what kind of 
curves would be obtained? * 

11 . Determine by inspection the value of (a) 10 Iog u e , (b) a log a N . 

12 . Solve for x: logi 0 (x 2 ) = (logiox)(log«x). 

13. Solve for x: logi 0 (x 2 ) — logi 0 (x/10) =* 2. 

14 . Determine a number x such that the square of log x exceeds log x by 2. 

(Logs to base 10. Two answers.) 

16 . On semi-logarithmic codrdinate paper, a straight line is drawn through the 
points (2, 1) and (4, 100). Determine the function which has that line 
as its graph. Hint. Use the form y = Ar*. Am. 100y * 10*. 

16 . Same as exercise 15 for the points (1, 6) and (2, 18). Am. y = 2(3*). 
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17. On logarithmic coordinate paper, a straight line is drawn through the points 

(2, 12) and (3, 27). Determine the function which has that line as its 
graph. Ans. y = 3a 2 . 

18 . Data from a certain experiment involving voltage (v) as a function of time 

(t) are plotted on logarithmic coordinate paper, and are found to exhibit 
a linear trend there. A line is drawn, with a transparent ruler, which seems 
to fit the plotted data best. Two points on this line arc (6, 18) and (8, 
32). Determine an equation expressing v in terms of t whose logarithmic 
graph is the line. 

19 . Draw the graph of y = 25x” on logarithmic coordinate pay>er, (a) when n = 

2, ( b ) when n = — 2. Mark scales clearly. 

20 . The graph of y = logic x assists one in remembering several important 

properties of the logarithms of real numbers. Sketch this graph and 
state; some of these properties. 

21 . Read and report on one or more of the references cited in §17. 

Note. Source material for additional exercises on curve fitting 
may be found in the current volumes of the following publications: 

1. Statistical Abstract of the United States. 

2. World Almanac and Book of Facts. 



CHAPTER VIII 

CORRELATION THEORY 

1. The Meaning of Simple Correlation. So far we have been 
concerned with the problems which arise from variation in a single 
variable. We will now consider the simultaneous variation of two 
variables. Methods for disclosing the facts of co-variation and for 
measuring the degree of relationship existing between two variables 
are due mainly to the English biometricians Sir Francis Galton 
(1822-1911) and Karl Pearson (1857-1936). 

Data presenting two sets of related measurements or observations 
may arise in many fields of activity yielding N pairs of corresponding 
variates (a\*, $/<), i — 1, 2, 3, • • N. Thus x may represent July rain- 
fall and y the average yield of corn in a certain section; x may be 
an index of commodity prices and y an index of employment over 
the same period; we may be interested in a group of school children 
in which x is their height and y their weight, or x may refer to their 
reading ability and y to their spelling ability; we may be studying the 
chance distributions which are obtained in throwing two dice where 
x is the number obtained in throws of a single die and y is the number 
obtained in throws of the two dice together. 

Example 1. In the following set of selected heights (inches), x = stature of 
father, y = stature of son. 


X 

69 

70 

69 

68 

70 

73 

69 

67 

69 

64 

y 

68 

69 

72 

67 

70 

71 

72 

66 

71 

65 


Example 2. ( Snedecor .) The following data on twelve trees are adapted from 

the results of an experiment to test the phenomenon that the injury by codling 
moth larvae seems to be greatest on apple trees tearing a small crop. Here 
x ** hundreds of fruit on a tree, y = percentage of fruits wormy. 


X 

15 

15 

12 

26 

18 

12 

8 

38 

26 

19 

29 

22 

y 

52 

46 

38 

37 

37 

37 

34 

25 

22 

22 

20 

14 
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When the given pairs of values are represented by dots locating 
the points whose rectangular coordinates are (j, y) we obtain a so- 
called “ scatter diagram ” (Figure 33). The problem is to determine 
the degree of association, or correlation as it is called, between the 
z's and the corresponding y } s since this indicates the significance of 
the relationship. 

The field of correlation may be thought of as bounded on the one 
extreme by perfect functional dependence and on the other extreme 
by complete independence in the probability sense. For example, 

the pairs of values which satisfy the 

y equation y — 2x — h do not present 

• */ ; a statistical problem. In this case the 

. • * relationship is defined by a mathe- 

matical function y = f(x). Similarly, 

• • ‘ x at the other extreme we would not be 

concerned with pairs of values which 
33 are completely independent in the 

probability sense, as, for example, the 
grades of students in statistics and the heights of their fathers. Two 
variables are said to be statistically related when they lie between 
these two extremes of relationship. 

The theory of correlation is concerned with a twofold problem: 
first with measuring the indicated relationship, and secondly with 
predicting or estimating the average value of y associated with a 
designated value of x. 

2. The Coefficient of Correlation. It is fairly obvious from Figure 
33 that w r ith values of x in an assigned interval Ax ( Ax small) the 
corresponding values of y differ considerably. There is said to be 
positive correlation if, for an assigned x larger than x, the mean of the 
corresponding y values is larger than y f and, for values of x smaller 
than x, the mean of the corresponding values of y is less than y . 
On the other hand, as x increases the tendency may be for y to de- 
crease. In this case, for an assigned x larger than x the mean of the 
corresponding y values is less than y, and for an assigned x less than 
x the mean of the corresponding y ’ s is greater than y. There is then 
said to be negative correlation. If, for an assigned x taken at ran- 
dom a corresponding y is no more likely to be above than below y } the 
variables are independent in the statistical or probability sense and 
there is said to be zero correlation between them. 

When the variables are correlated there is a tendency for the dots 



174 


Correlation Theory 


vm 


Theorem I. The value of r is independent of the origin of reference 
and the units of measurement. 

Proof: Let 

x - x 0 y - ?/o 

u » v = 

h k 

Then 


x = uh + x o, y — vk + yo, <r x = ha u> <r„ = k<r„. 

Substituting in (2) wc obtain 


(4) 


S( M “ ,7 )(" ~ *’) 


r = 


O'uO'v 


( 4 a) = 

(TuCTo 

where cr „ = ^ - S 2 J ’ <r “ = Z v ~ ~ l ' 2 J * 


Since (4) and (4a) are independent of the constants x {) , ?/o, h, and k , 
the theorem is proved. 

This property is of fundamental importance. It means that the units of 
measurement for the two sets of observed quantities can be chosen indepen- 
dently of each other. If the two sets of quantities are of the same kind, the 
units need not be the same in both cases; and, what is more important, if the 
quantities are of different kinds, so that- the units are not comparable at all, the 
coefficient r nevertheless may have a definite meaning. (Of course? the value of 
the coefficient will be affected by a change in the method of measurement of one 
of the quantities, such as the substitution of an area for a length in estimating 
the size of an object, or the assignment of different relative weights to the ques- 
tions on an examination. ) 

The pairs (a*, iji) may be all distinct or there may l>e repetitions among them. 
But it is necessary to impose the condition that neither Xi nor ?/< shall be con- 
stant throughout. This condition is imposed to insure that the denominator 
shall not vanish in the various formulas for r. The Algebra of Correlation — 
Dunham Jackson. American Mathematical Monthly , vol. 31 (1924), pp. 110— 
121 . 


When the given values of x and y are large and a computing machine 
is not available, the computations may be lightened by an appropriate 
choice of these constants. If only the origin of reference is changed, 
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then h = A* = 1, and u = x — x 0 ,v = y — y 0 . If the means are taken 
as the origin of reference by letting x' = x — 2 and y f = y — g, 
then x' = y' = 0 and the formula becomes, 


A subscript notation should be attached t o r when there are several 
series of variates. Thus, r xy for the y) series, r TZ for the (.r, z) 
series, r for the series denoted by (xi, ^* 2 ), etc. 

Example 3. To illustrate the; formulas wo will compute the value of r for th© 
following data. Here x = Brokers’ Loans in billions of dollars and y — l 1 he 
Annalist's index of the prices of fifty rail and industrial stocks in 1929. We choose 
u = x — 5.00 and v - y — 250. 


( 5 ) 




r — 




Month 

X 

y 

u 


I 

n 


J 

5.33 

BB 

.33 

-2 

-0.06 

.1089 

4 

F 

5.67 

m 

.67 

-2 

-1.34 

.4489 

4 

M 

5.65 

243 

.65 

-7 

-4.55 

.4225 

49 

A 

5.56 

249 

.56 

-1 

-.56 

.3136 

1 

M 

5.53 

235 

.53 

-15 

-7.95 

.2809 

225 

J 

5.28 

265 

.28 

15 

4.20 

.0784 

225 

J 

5.77 

232 

.77 

32 

24.64 

.5929 

1024 

A 

6.02 

303 

1.02 

53 

54.06 

1.0404 

2809 

S 

6.35 

290 

1.35 

40 

54.00 

1.8225 

1600 

0 

6.80 

230 

1.80 

-20 

-36.00 

3.2400 

400 

N 

4.88 

201 

-.12 

-49 

5.88 

.0144 

2401 

D 

3.45 

206 

-1.55 

-44 

68.20 

2.4025 

1936 

Sums 

■ 

6.29 



10.7659 

10678 

1 0 

— Sums 

N 

Bi 

.5242 

0 

13.3267 

.8972 

889.8333 


Computations: o u = [8972 — (.5242 ) 2 ] 1 ^ 2 = .79 

<r. = 1889.8333] 1 '* = 29.83. 

From (4a) we have, 

13.3267 


r 


(29.83) (.79) 


.57. 
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Experienced computers use calculating machines to great advan- 
tage in large-scale computational studies. The following reference is 
recommended to students who expect to engage in such work: “The 
Calculation of Correlation Coefficients from Ungrouped Data” — 
P. S. Dwyer, Journal of the American Statistical Association, vol. 35 
(1940), pp. 671-673. 

Exercises 

1. When x ' and y ' represent deviations from the means, 

(а) Show from (1) that £ Wv ' = Nra^ay. 

(б) Show that Mx* 2 = YL x ' 2 * 

2. Derive formula (3) from (2). 

3. Show that (3 ) may be written as 

NY,*!/ - Yl x J2y 

r " l{*2> - (2>) 2 } \nYLv 2 - (L*/) 2 }] 1 ' 2 ’ 

4. Find r for the data of Example 1. 

5. Find r for the data of Example 2. 

6. The following data represent the ages of husband (x) and wife (y) of twenty 

couples. Find r using (5). A ns. 0.850. 


X 

22 

24 

26 

26 

27 

27 

28 

28 

29 

30 

30 


31 

32 

33 

34 

35 

35 

36 

37 

V 

18 

20 

20 

24 

22 

24 


24 


25 

29 

32 

27 

27 

30 

27 

30 


0 

32 


7 . In studying a set of pairs of related variates, a statistician has completed the 

preliminary arithmetic and obtained the following results: 

N = 100; I> ! = 1,585,000; £> = 12,500; I>y = 1,007,425; = 

648,100; = 8,000. Find x, y, <r It a„, r. 

8. The table in Exercise 2, page 97, contains the grades made on two tests by 

twenty-five students in mathematics. Find r for these data. Ans. 0.786. 

9. Suggest examples of negative correlation. 

10 . In the following anthropometric measurements on a random sample of 
twenty male freshmen, taken from the Physical Education Department, 


X 

y 

z 

X 

y 

z 

68.5 

33.6 

148 

„65.3 

33.0 

136 

67.2 

35.0 

144 

65.1 

34.0 

144 

67.7 

30.2 

145 

64.8 

37.3 

170 

63.8 

30.0 


69.6 

33.4 

154 

69.9 

33.0 

130 

68.2 

31.5 

122 

64.7 

31.0 

112 

68.8 

32.0 

141 

68.4 

33.0 

134 

72.3 

35.0 

159 

66.4 

30.2 

112 

67.8 

33.7 

134 

69.1 

33.3 

143 

71.3 

31.5 

136 

71.0 

32.3 

136 

63.5 

33.6 

126 
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x represents height, y represents chest measurement, both measurements 
being taken to the nearest tenth of an inch, and z represents weight to the 
nearest pound. Find the coefficient of correlation (a) between x and y, 
(b) between x and z, (c) between y and z. 


4. Regression. The properties of r can be studied by fitting a 
line to the scatter diagram in such a way as to make the sum of the 
squares of the vertical distances from the points to the line a mini- 
mum. 

When such a line is referred to the point (x, Tj) as origin, we have 
seen (§9, Chapter VII) that its equation is y ' = m\X f where 


mi 


I>y 

2>' 2 


and x' — x — x , ?/ = y — y. This value of m\ may easily be ex- 
pressed in terms of r and the standard deviations, as follows: 


Nrar^ijc a v 

= "V7“— = r " - 

i\cr x - (T x 


Therefore, the equation of our line, referred to a system of axes whose 
origin is at the means of the variates, is 



<Tx 


This is called the regression line of y on x. The term regression 
was used first by Galton in studying inheritance of stature. He 
found that offspring of abnormally tall or short parents tend to 
“ step back ” or “ regress ” to the ordinary population height. 
However, as now used, regression line has no reference) to biometry, 
but is merely a convenient term. 

By fitting a line x f = m 2 y' to the points of the scatter diagram in 
such a way that the sum of the squares of the horizontal distances 
from the points to the line shall be a minimum, it is possible to de- 
duce a second regression line (the regression line of x on y) whose 
equation, referred to (x, y), is 

(7) _ x’ = —rtf'. 


Note that (7) cannot be obtained by solving for x’ in (6). The 
two regression lines will coincide if, and only if, r = ±1. From 
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the equations of the regression lines it is evident that if r > 0, an 
increase in the one variable tends to accompany an increase in the 
other; if r < 0, an increase in the one will be accompanied by a 
decrease in the other. 

liquations (6) and (7) are usually expressed in terms of the 
original variables x and y instead of the deviations x f and y\ It is 
obvious that they may be written as 


(8) 

V - y = r — {x - l) 
CT Z 

and 

(9) 

X - * = r- (y - 9) 


when referred to the origin of x and y. 

Equation (8) may be used to estimate values of y corresponding 
to designated values of x. Similarly, from equation (9) we may 
estimate x for designated values of y. It would be appropriate to 
use (8) as a predicting equation when the variation in y is caused or 
controlled by the variation in x; (9) would be used when the varia- 
tion in x is caused or controlled by the variation in y . 

The quantity nil = r(a v /i t x ) is called the regression coefficient of y 
on x , being the variation in y corresponding to a unit change in x . 
Likewise, ra 2 = r(a x /a v ) is called the regression coefficient of x on y. 
Thus the numerical value of r is given by (mira 2 ) l/2 but its sign must 
be that which is common to the two regression coefficients. The fol- 
lowing quotation from Snedecor (reference 13, list p. 6) sheds light 
on the distinction between regression and correlation. 

The point of interest here is that r is the geometric mean of the two regression 
coefficients. In ordinary units of measurement, therefore, r is an average of the 
two regression coefficients used in (i) estimating y from x and (ii) estimating 
x from y. This serves to clarify the relatiop of the two coefficients, correlation 
and regression, in measuring relationship. The latter is the appropriate one if 
one variable, y f may be designated as dependent on the other, x. Values of y 
may be partly controlled or caused by x , as when the available amounts of some 
glandular secretion cause differences in the sizes of organisms. Or, y may be 
subsequent to x, as weight gain in nutrition experiments follows the measurement 
of initial weight. In such cases, the regression of y on x is usually the statistic 
that furnishes the information desired. It is then appropriate to attempt to 
estimate the value of y from a knowledge of the corresponding value of x. Cor- 
relation, on the other hand, is the appropriate measure of the relation between 
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two variates like statures of husband and wife. The two heights are known to 
be associated through some complex of social and biological causes, but neither 
may be looked upon as a consequence of the other. In this sense correlation 
is a two-way average of relationship, 
while regression is directional. Of course, 
there are many variables whose relation- 
ship may be studied by means of either 
correlation or regression, or both. It is 
necessary only to keep clearly in mind 
the character of the relation being con- 
sidered. 

Geometrically, mi is the slope of line (8) and l/m 2 is the slope of 
line (9). The two lines intersect at (5i, y). 



Exercises 

1. Derive the equation of the line of regression of x on y as suggested above. 

2. Find the equations of both lines of regression for Exercise 6 (page 176), and 

plot them. A ns. $a = .888x — .64 
x = .825z/ + 8.55. 

3. Using the appropriate equation, find the estimated values of y corresponding 

to the given values of x, for Exercise 6 (page 176). 

4 . Given the following results for the heights and weights of 1000 men students: 

y = 68.00 in., x = 150.00 lbs., r = .60, 
a v = 2.50 in., a x = 20.00 lbs. 

John Doe weighs 200 lbs., Richard Roe is five feet tall. 

Estimate the height of Doe from his weight, and the weight of Roe from 
his height. 

Am. Doe’s height = 71.75 in. 

Roe’s weight = 111.6 lbs. 

5. (a) Given the following: 

E® = 150,000, £> 2 « 22,725,000, £>2/ = 10,522,500, 

Ey = 70,000, Ey 2 = 4,936,000, n = 1000. 

Find £, y, <r z , <r Vi r, and the lines of regression. 

( b ) Suppose the data in (a) refer to the weight in pounds (x) and the height 
in inches (y) of a sample of 1000 policemen. Suppose Paul Private weighs 
160 pounds and Saul Sergeant is 6 feet tall. Estimate the height of 
Private and the weight of Sergeant. 


5. The Standard Error of Estimate. The average concentration 
of the points around the regression line of y on x may be measured 

by the expression ~ ^jTd 2 where d is the difference between an ob- 
served y and the y obtained from the regression line. The value of 
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will be denoted by S„ 2 , and S v is called the standard deviation 


of the errors of estimate, or more briefly the standard error of estimate. 
The errors of estimate are the deviations of the observed values of y 
from the corresponding estimated y’s. Or to describe them another 
way, they are the deviations of the sample y ’ s from the assumed 
population y’s. It can be shown that S„ 2 = <r„ 2 (l — r 2 ). To prove 
this we may write the sum of the squares of the deviations in the 
form: 


NSS = £(V - r -*’ V - 2>' 2 - 2 r-’2>y + 

\ <Tx / cr x O 2 

= Na y 2 - 2 NrW + NrW = Na y 2 ( 1 - r 2 ). 


Hence, we have 

(10) S v 2 = <r* 2 (l - r 2 ) 

and 


(10a) 


Sy = a y ( 1 - r 2 yt\ 


An analogous consideration of the differences between the x’s and 
the regression line of x on y gives for the square of the standard 
error of estimate of the .t’s 


(11) S 2 = <7, 2 (1 - r 2 ). 

6. Properties of the Correlation Coefficient and Standard Error 
of Estimate. Certain properties of r may now be deduced. It is 
obvious from (10) that |r| < l because both the left member and 
c v 2 are positive or zero. Therefore, 

-1 < r < 1. 

If the points all lie exactly on the regression line, the left member of 
(10) vanishes and r = ±1. There is then said to be perfect linear 
correlation, since the relation between^ and y is given exactly by a 
linear function. A large numerical value of r means that the regres- 
sion lines are close to coincidence and the points in a scatter diagram 
cluster closely around the regression lines. 

When the regression lines (8) and (9) are expressed in standard 
units, they become respectively 

(12) t v = rt 9 
and 
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( 13 ) 

where 


t x = rt v or 



t x 



ty 


y -y 

<Jy 


In this form we see at once that as one variable ( x increases, the other 
variable t y increases (or decreases) to an extent that depends upon r. 
Thus r measures co-variation in the variables when they arc ex- 
pressed in comparable units and when regression is linear. 

In standard units, r is the slope of line (12) and 1/r is the slope of 
line (13). When r = 0, the regression equations become t v = 0 and 
t x — 0 in standard units or y = y and 
x = x in the original units. These are 
also the equations of the coordinate 
axes. Therefore, when r = 0 the re- 
gression linos are perpendicular to each 
other and coincide with the l x and t y 
axes. When r --- 1 the regression equa- 
tions become identical and the two lines 
coincide in quadrants I and III. Simi- 
larly, when r = — 1 they coincide in quadrants II and IV. In each 
case the coincident lines bisect the quadrants if the equations are 
expressed in standard units, but not otherwise unless <r„ = a x . The 
angle Q between the regression lines varies from 0° to 90° as r varies 
from one to zero. 

When there is no correlation between x and y then r = 0, and the 
variables are said to be independent in the statistical sense. On the 
other hand, when r = 0, it is not necessarily true that the variables 
are statistically independent. Indeed there may be a high correla- 
tion 1 with non-linear regression when r = 0. (Non-linear regression 
will be considered in §21.) Incidentally, the phrase “ independent 
variables ” in the statistical sense should not be confused with the 
phrase “ independent variables ” which is used in the ordinary sense' 
of analysis to designate the variables on which a specified function 
depends. However, the two usages, though quite distinct, are not 
fundamentally contradictory, since functional dependence can be 
regarded as a limiting case of statistical dependence. 

1 See H. L. Rietz, On Functional Relations for which the Coefficient of Correlation 
is Zero. Journal American Statistical Association, vol. 16, 1919, pp. 472-476. 
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For an appreciation of the use of S v in passing judgment upon the 
precision to be expected in estimating values of y by means of the 
regression equation of y on x, it is instructive to consider simulta- 
neously the meanings of (8) and (10a) as |r| varies from 0 to 1. When 
r = 0, (8) becomes y = y which means that the best estimate of y 
for any value of x is the mean of the ^-distribution. In other words, 
knowledge of x is of no value in predicting y. When r = 0 in (10a), 
S v = <r y . This is to be expected since the dispersion S„ about the 
line y = y is the same as the dispersion a y of the given y ’ s about their 
mean. But as |r| increases from 0 to 1 , S y decreases from a y to 0. 
Graphically, the meaning of this improvement in S v in comparison 



Fig. 36 — For a Fixed Value of <t v , S v Decreases in Proportion 
to (1 — r 2 ) 112 as r Increases 

with <T Vf as r increases, is shown in Figure 36 where parallel lines are 
drawn at a vertical distance of S v on either side of the regression line 
RR’. For a given value of \r\ ^ 0 this strip encloses the average dis- 
persion about the line. The strip on either side of y = y at a dis- 
tance of (T y from it encloses the average dispersion about the line when 
r = 0. As |r| increases from 0, the line rotates from the horizontal 
position of y = y to the terminal position it would have when \r\ = 1, 
and at the same time S y decreases toward 0. Formula (10a) tells 
us that as |r| thus increases, S u decreases from <r„ in proportion to 
(1 - r 2 ) 1 ' 2 . 

A similar analysis could be made concerning the line of regression 
(9) of x on y which rotates from the vertical position x = 35 when 
|r| = 0 to meet and coincide with line (8) when |r| = 1. As line (9) 
rotates, S x decreases from a x to 0 in proportion to (1 — r 2 ) 112 as 
r increases. 
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As \r\ — » 1, (12) and (13) rotate toward each other at equal angular 
velocities. When they are coincident their slope is dbl. Lines (8) 
and (9) rotate at angular velocities which 
are proportional to m x = tan a and = 
tan fi t respectively, when mi and m 2 arc 
defined in §4. Their slope at coincidence 
is it <t v /<t x . For line (12) it can be shown 
that 

(14) i 2> = 1 - r* 

where 5 is the difference between an ob- 
served value of l y and the ordinate ob- 
tained from (12) for the corresponding value of t x - Thus, 

# Y, 6 ' = X ~ rt *) 2 

= ^ Y l v 2 — Y l * l v + ft Y l * 

= 1 — 2r 2 + r s 

= l— r 2 



This result would also be apparent from the derivation of (10) since 
S = d/<r v where d refers to residuals in units other than standard 
units. 

ft is obvious from (14) that the maximum value of — is unity. 
Therefore, adopting 
(15) 1 - ^ 


as $ measure of goodness of fit, we see from (14) and (15) that r 2 
is a measure of the goodness of fit of (12) to the points of the scatter 
diagram expressed in standard units. By an analogous argument a 
similar conclusion concerning (13) can be made. 

7. Further Discussion. Given a set of N pairs of x and y cor- 
related values. Suppose the necessary constants are evaluated to 
obtain the regression equation (8). Then if the given values of x 
are substituted in this equation, a set of estimated y’ s, say will be 
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obtained. The mean, gy, of these estimated y’s is the same as the 
mean of the observed y’s. The proof is as follows. From (8) we have 

sy = y + r— (x — x). 

Ox 

Then 

= N + r 7 X N “ **' 

But — f ) = 0 by Theorem VI, Chapter III. So gy = y. 
i 

We now state the following theorem. 

Theorem II. The variance, <tk v \ of the estimated y’s equals riaf. 
Proof: By definition, 

VEv 2 = Z(*y. — gy) 2 . 

From the above discussion, (gy, — B y) is the same as (y,- — y) which 
is given by (8). So 

Hence 

(16) CEy* = r 2 (T y 2 . 

From this theorem and (10) we obtain 

(17) S v 2 = <r„ 2 - <r E y 2 - 

This relation helps to clarify the meaning of r and of S y . It is con- 
ventional to call <t E v 2 the variance in y which can be explained from 
knowledge of x ; that is, which the regression of y on x accounts for. 
(In the language of some writers, &^ y 2 measures the variation of 
regression about the mean.) Therefore, (17) shows that S y 2 is the 
variation in y after the accompanying variation in x is duly dis- 
counted. S y 2 is sometimes called the residual variance because it 
measures the variation in the dependent variable y which knowledge 
of x fails to account for. This relation can be depicted geometri- 
cally by the sides of a right triangle. To standardize the representa- 
tion we can take <r y = 1 as the diameter of a semicircle within which 



Sec. 8 


Coefficient of Alienation 


185 


is inscribed the right triangle, as in Figure 37. In the figure, cos 6 =* 
obv/<?v So from (16) we have cos S = r. The particular values of 
6 in the figure, found from a table of cosines, are 0 = 36° 52' when 
r = .8, and 6 = 25° 50 7 when r = .9. When r = 1, then <rs v =» a y 
and the regression of y on x accounts for all the variation in y. 



r=. 9 



.‘*7 


Theorem III. The correlation between observed and estimated values 
of y is the same as that between the observed values of x and y. 

Proof: We are to show that 

^ T,yey - M 

<7 EyG u 

reduces to one of the formulas for r. Substituting the values for 
eV, eV, oev into the above expression and simplifying, we obtain (3). 
The details of the proof are left to the student as an exercise. 

8. Coefficient of Alienation. A measure of the failure to improve 
estimates of y from knowledge of correlation is given by 

(18) k = (1 r 2 ) ,/2 . 

It is sometimes called the coefficient of alienation. Incidentally, it 
is interesting to observe that the functional relation between k and r 
is shown, graphically, by a semicircle of unit radius, i.e., 

fix) = (i - r*y». 

The formula 

k' = 1 — (1 — r 2 ) l/2 

may be called the improvement factor because it shows the decrease 
in Sy/ffy as |r| increases. It is clear that 

** - Tr ^ and V - 1 - k. 

N ^ <r„ 2 
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Table 31 gives 1 values of k and A*' for values of r. With no knowl- 
edge of correlation, the best estimate of an individual y is y. Values 
of k f for assigned r’s show how much better than this guess is the 
estimate of an individual y value with knowledge of correlation. 
For example, when r = .5 the column headed k in Table 31 shows 
that the standard error S y is about 87% of <r y . Or, from the k' 
column, S y has been reduced only 13% from what it would have 
been if y had been used for prediction purposes. The third column 
thus shows how the prediction value of r varies with r. Thus as |r| 
decreases from 1 to .8, S v /(r v increases from 0 to 60%. Or from 
another point of view, as \r\ increases from 0 to .8, the error of 
estimate is improved by only 40%. A correlation of r = .9 permits 
prediction of individual y’s only 5(3% better than a mere guess based 
on the mean. 

It is fairly obvious that we cannot, with any considerable degree 
of reliability, predict from ordinary values of r an individual y for an 
assigned x. However, with a large V, we can give a very reliable 
prediction of the mean of y values that correspond to an assigned 
value of x. This can best be explained from a correlation table 
which is used when N is large and which will be explained in the 
next section. 


Table 31 — Values op r and the Corresponding Values of k and V 


r 

k 

k ' 

.1 

.995 

.005 

.2 

.980 

.020 

.3 

.954 

.046 

.4 

.917 

.083 

.5 

.866 

.134 

.6 

.800 

.200 

.7 

.714 

.286 

.8 

. 600 * 

.400 

.9 

.436 

.564 

.92 

.392 

.608 

.94 

.341 

.659 

.96 

.280 

.720 

.98 

.198 

.811 

1.00 

0.000 

1.000 


1 Constructed from a table of sines and cosines. Letting r = cos 0, sin B ■* 
(1 — cos* 9) lt% ■ (1 — i*) 171 . 
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Exercises 

1. Given the following correlated data: 


X 

8 

6 

4 

7 

5 

y 

9 

8 

5 

6 

2 


(а) Compute the correlation coefficient. 

(б) Find the regression line of y on x. 

(c) Find the estimated values of y corresponding to the given values of x. 

( d ) Compute the standard error S v of predictions in two different ways. 

Am. 

= .69, »», = r—jt = 1.2, 8, = Vali = 1.76. 

V 2 

Note. In practical work, it is never worth while calculating a correla- 
tion coefficient for so few observations. These fictitious data are given 
solely as an exercise on which the student can test, his knowledge of the 
methodology. 

2 . Prove that the ratio of variance of the estimated y' s (taken about their 

mean) to the variance <r v 2 of the given ?/’ s is equal to r 2 . 

3. If S y 2 /a y 2 — 1 — r 2 is the ])ercentnge of the total variance of y uncontrolled 

by knowledge of x, what is the remaining percentage, determined by or 
calculable from knowledge of jc? 

4 . What equation is the equivalent mathematical statement for the following 

words? 

If the respective deviations in each senes, x and ?/, from their means 
were expressed in units of standard deviations — that is, if each were 
divided by the standard deviation of the series to which it belongs — and 
plotted to a scale of standard deviations, the slope of a straight line best 
describing the plotted ]>oints would be th<< correlation coefficient, r. 

6. Given the standard deviations <r x and a u of two distributions of correlated 
variates: 

(a) What is the standard error in estimating y from x if r =0? 

(b) By how much is S v in (a) reduced if r is increased to .25? 

(c) How large must r be in order that, S y be one-half as large as in (a)? 

(d) What must r be in order that S v be reduced to one-third its value in (a)? 

(e) At what, value of r is S v reduced to zero? 

(/) For any value of r, what is the ratio between the standard error of 
estimating y from x and the standard deviation of the y-distribution? 

6. Evaluate the following statements: 

(а) A correlation coefficient less than zero indicates an absence of linear 
relationship. 

(б) A correlation coefficient of r = .6 indicates twice as close relationship 
a9 a coefficient of r — .3. 


2.4 

V2V6 




188 


Correlation Theory 


vm 


7* If all the points lie exactly on the regression line of y on x, show that S v 2 
and hence that r = ±1. 

8 . Show that S y 2 may be computed by means of the relation p 


NS V * 


Zv'* - 


£*'* 


0 


9 . 


where the primes denote deviations from the means. 

( For analytics students.) Show that the tangent of the angle from line (8) 
to line (9) is 


and from line 


tan $ = 


cr x tr u 


<r x 2 + 
(12) to line (13) is 



tan O' = 


1 - r 3 
2r 


What is the value of 0 when r = 1; when r - 0? 

10. The least-squares criterion of best fit requires that 5Z$ 2 be a minimum, 
where S is the distance between the line and a point. Three cases arise 
dcixmding on whether 
Case I t & is measured parallel to the ?/- axis, 

Case II t S is measured parallel to the z-axis, 

Case Illf 5 is measured jjcrpendicular to tin* line. 

We have seen that Case I yields line (12) and that Case II yields line (13). 
In Case; III the line has no universally accepted name but it may be called 
the 11 geometrically best-fitting line.” 

( For calculus students.) For Case III prove the following: 

(a) In standard units, the equation of the line is 

t y = t x if r > 0 
and t y — — lx if r < 0. 

Solution. Let the equation of the required line be 

ty ~ mix "F k. 

Then by analytics, 

1 v' .« I ^ ( mix k — ty\ 2 

n^ s VT+tr) 

m 2 + k 2 + 1 — 2 mr 
= 1 -Km 2 ’ 

To make this a minimum, first put k 2 = 0. Call the result f(m). Then 


/(m) 

/'( to ) 


m 2 + 1 — 2mr 
1 +m ! 9 

2m 2 r - 2r 

(1 + TO*)* ' 

4mr(3 — m 2 ) 

. . . .. . — • 


/''(TO) 


(1 + TO*)* 
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The second derivative will be positive when m and r have the same sign. 
Since J(m) is a minimum when m = dbl, wc are to take m = 1 when 
r > 0 and m = — 1 when r < 0. 

(6) If r = 0, all lines (for which k 2 = 0) fit equally well. Hint. If r = 0, 

/(m) = 1. 

(c) ^ as 1 — |r|. Hint. What is the value of f(m) when m = dbl? 

Note that |r| = +r, if r > 0 and |r| = — r, if r < 0. 

(d) Goodness of fit is measured by |r|. 

(c) When r = .6 the lit. is twice as good as when /• = .3. 

11 . The following .query and answer appeared in Biometrics Bulletin, vol. 1, 
no. 3, pp. 30-37. “ Research " assign men ( : Investigate the references 

cited in the answer and justify I lie procedure which is recommended 
(under tin* given hypothesis). 

Query. A problem that has bothered me is the fitting of regression 
lint's when their position is lestricted in some way. For example, suppose 
a test is made of the relationship lie I ween the number of fish caught, in a 
body of water and the average' number which can be caught out of it, with 
a standard ainoui>l of fishing. In fitting a regression line to such data, 
we know that, the point, (0, 0) must, fall on the line, since if no fish tiro 
present certainly none will bo caught. In other words, we have one 
point which is free from sampling error. The unique importance of this 
point will, it seems to me, make observations in its neighborhood of rela- 
tively less importance than observations at a distance from it, where 
there is no fixed guide-post. Do you know of any treatment of situa- 
tions of this sort., by which the best straight (or curved) line could bo 
fitted t.o data where there is one point which must be satisfied? , The 
standard deviation from regression (“ standard error of estimate ”) and 
the standard error of the regression would also be available. Or are these 
concepts pertinent, in such a question? 

Answer. Doming (§15 and §11 of reference 4) gives both a general 
method and some particular solutions of your problem. Snedecor (refer- 
ence 6) opens his Chapter 6 with an illustration of the simple case in 
which x is measured without error and the variance of y is constant for 
all values of x. 

Observations in the neighborhood of (0, 0) may or may not be of less 
importance than those at. greater distances; it depends on the variance 
of y. One often finds that this variance increases with x. In fact, there 
are many situations in which it seems reasonable to suppose that in the 
sampled population the standard deviation of y is directly proportional 
to x. If you think this hypothesis is suitable in your fishing, the appro- 
priate method is to calculate the ratios x/y where x is the number of fish 
caught and y is the total number of fish, then apply to them the statisti- 
cal procedure suitable for a single variate. — George W. Snedecor. 

9. Correlation Table. When the sample to be studied is large, 
it is more convenient to replace the scatter diagram by a correlation 
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table. We may divide the xy-plane into rectangles of convenient 
size, and all points of the scatter diagram falling within any rectangle 
are thought of as being concentrated at the center of this rectangle. 
A number is then written within the rectangle to designate the 
number of points at its center. A correlation table is therefore a 
two-way frequency table exhibiting the frequencies in each class 
interval. 


Table 32 




65 - 
69 

70 - 

74 

75 - 

79 

80 - 

84 

85 - 

89 

90 - 

94 

95 - 

99 



X 

67 

72 

77 

82 

87 

92 

97 

Hy) 

90-94 

92 




1 

2 

3 

1 

7 

85-89 

87 



1 

3 

8 

1 

5 

18 

80-84 

82 

4 

4 

6 

4 

9 

1 


28 

75-79 

77 

3 

3 

7 

6 

4 



23 

70-74 

72 

2 

3 

5 

6 

1 

1 


18 

65-69 

67 

3 

2 






5 

60-64 

62 

1 



! 




1 


/(*) 

13 

12 

19 

20 

24 

6 

6 

100 


Suppose Table 32 is constructed in this way for a set of average 
daily grades (x) and final examination grades ( y ) of 100 students. 
When the data have been thus grouped into classes, the class marks 
are regarded as the variate values. Thus in Table 32 there are 9 
students whose daily grades are 87 and whose final examination grades 
are 82. The last column labeled f(y) represents the distribution 
of y variates and the last row labeled f(x) represents the distribution 
of x variates. A correlation table ig thus a bivariate distribution. 
In Table 32 the width of the class interval is the same for x and y, 
but of course this is not generally the case. 

10. Notation. In order to compute r from a correlation table it 
will be necessary to develop new notation. Since we are now dealing 
with frequencies in both the x-direction and the y-direction, we will 
distinguish between them by /(x) and /(y). To be sure, this has 
the disadvantage of being the same symbol as that for function, but 
from the context no ambiguity should arise. <, 
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Generalizing, a correlation table is of the following form: 



The rectangles containing the frequencies are called cells. The 
frequency in a typical cell is denoted by /(x, y), meaning the frequency 
in the cell whose coordinates are x and y, where x and y are the 
mid-values of the class intervals. Both columns and rows are sub- 
distributions of the total frequency N. Each column is a frequency 
distribution of y’ s corresponding to a mid-x value. Similarly, each 
row is a frequency distribution corresponding to a mid-?/ value. 
The sum along any row is denoted by 2/(x, y), being the sum of 

X 

the frequencies in the (x, y ) cells in the x-direotion. Since the 
marginal total for any row is the total frequency corresponding to 
a given value of y, it is therefore written in the column headed f(y). 
Thus, in Table 32, for y = 92, 

£/(*, V) = Z/fa 92) = 1 + 2 + 3 + 1 - 7. 

X X 

Similarly, £/(x, y) denotes a summation in the y-direction of all the 
y 

entries in a column, corresponding to a fixed value of x, so it denotes 
an entry in the "bottom row which contains the /(x) frequencies. 
Thus, for * = 67 

Z/(67, y)=4 + 3 + 2 + 3 + l = 13. 
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Summarizing, 

(19) Z/fo y) =f(y); Z/fo y) = /(*)• • 

x y 

With regard to N t we may obtain it from a correlation table in 
three ways: (1) by adding the entries across the rows and then 
totaling the resulting sums in the marginal column labeled f(y); 
(2) by adding the entries along the columns and then totaling the 
results in the marginal row labeled f(x); (3) by adding the entries 
in the cells in any order whatsoever. Hence, the following notation, 

(20) ZZ/fo y) = ZZ/to y ) = Z f(*> y) = N > 

y x x y x,y 

will denote, respectively, the above-named procedures or orders in 
summing. From (19) and (20) we have 

(21) n = Z M = Z /(*) - Z /(*, v). 

V x x,y 


We may call f(x) and }{y) the marginal distributions of x and y, 
respectively. A correlation table with cell frequencies f(x, y) 
uniquely determines the marginal totals f(x) and f(y). The con- 
verse, however, is false. For example, we might replace the four 
cell frequencies in the upper right-hand corner of Table 32 by the cell 


frequencies 


2 

2 

1 

a 


without disturbing the marginal totals. 


11. Means and Variances. We will now express the means in 
terms of this notation, taking first the mean of x’s. From the funda- 
mental definition, we must multiply each x by its corresponding 
frequency in the cells and sum the results, taking the products in any 
order whatsoever. Hence, 


2 = 


1 _ 

N 


Ztffc s')- 

X, vu 


This may also be written 


* = ^ ZZ #(*i v) = ^ Z*Z/(*. y) = ^ Z */(*)• 

Observe that the x may be moved to the left of Z * n the second 

V 

expression because x is treated as a constant in a summation per- 
formed with respect to y. 
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Similarly, we have, 

5 = 77 V) = 4 : HUvSix, y) 

* V x, y iV y j 

= ^ 'Lv'Lffa y) = Jlvfiy)-. 

The student will observe that the last expression for the mean in each 
case is identical with that given for a frequency distribution of one 
variable, when allowance is made for the necessity of distinguishing 
between variables. 

Any column is an x array of ?/s, so the symbol y x is appropriate 
for the mean of a column. Similarly, x v denotes the moan of a y 
array of x’s, i.e. } of a row. We may now state the following theorem . 1 

Theorem IV. The mean y for (he whole table (in the y-diredion) 
is equal to the mean of the values y x for the several columns when each y x 
is weighted with the frequency in that column . 

Proof: We are required to show that 

^Jlf{x)y z = y 

where 

5 *“ W: 

Upon substituting in the first equation the value of y z as given by the 
second equation, we have 

~'L'Lyf( x > 2/) !>/(*> y) = y- 

iy x y -W x , y 

It is suggested that the student state and prove a similar theorem 
concerning x. 

In this new notation, the definitions of the variances becomes 

<Tx 2 = T-rZfc - *)*/(*. y) 

x, y 

= >’/(*)-*’; 

1 This is actually the same as Theorem IX on page 45, but it seems worth- 
while to state and prove it in the new notation. 
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X . u 

-£l>y<w -? 2 - 

Exercises 

1. Evaluate the following expressions in Table 32. 

(а) For x = 82, 

Z/(*. 2/), Zv/te »). /(*). 5- 

v v 

(б) For y = 87. 

Z/(®, y ), Z*/fo y), /(y), s„. 

j: x 

2. Refer to Table 27 (Chapter V) and let r be the number of a column. Express 

the answers in the third and second lines from the bottom of the table in 
terms of the notation of this section. Thus for x — 1, 

y, = ~ 'LvH*, V ) = \ [85 + (75)2 + (65)2 + (55)2] = 67.86. 

f{X) y l 

12. Computation of Means. Just as in the case of a one-way 
frequency distribution it was found convenient to choose an arbi- 
trary origin and take the 1 class interval as the unit, so we now do 
likewise. Let 

(22) u = \ (x - z 0 ); i.e., x = uk + x Q . 

n 

Hence, 

(23) x = tih + x 0 
where 

B 

Likewise, let 

(24) v = 7 (y - 2/0); i.e., y = vk + y 0{ 

k 

whence 

(25) y = vk + y 0 , 


where 
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Then a suitable form for computing the means of the x’a and y’a 
is as follows: 



u 

-3 

-2 

-1 

0 

1 

2 

3 

f(y) 

Vf(v) 

V 


67 

72 

77 

82 

87 

92 

97 

f(v) 

3 

92 




1 

2 

3 

1 

7 

21 

2 

87 



1 

3 

8 

1 

5 

18 

36 

1 

82 

4 

4 

6 

4 

9 

1 


28 

28 

0 

77' 

3 

3 

7 

6 

4 



23 

0 

-1 

72 

2 

3 

5 

6 

1 

1 


18 

-18 

-2 

67 

3 

2 

! 





5 

1 ”10 

-3 

62 

1 







1 

-3 

/(*) 

=/ («) 

13 

12 

19 

20 

24 

6 

6 

100 

54 

uf(u) | 

”39/ 

-24 

-19 

0 

24 

12 

18 

-28 



Compulations: 


whence 


whence 


1 —28 
a= ^ Su/(,t)= 17KT = -- 2 8 - 

x = 82 + 5(— .28) = 80.0. 
5 = ^ Si’/O’) = .54, 

y = 77 + 5 (.54) = 70.7. 


In the table /(v) = f(y) and f(u) = f(x) because n and v are merely different ways 
of describing the cells but in no way change tin* frequencies in those cells. 


13. Computation of r. In the expressions of §10 and §11 the 
(u, v ) coordinates could have been used instead of (x, y). The use of 
the former simplifies the computation of r. A preliminary discussion 
of certain expressions will help in understanding the formula for r 
to be used for a correlation table. Let us consider first the following 
expression: 

(a) v). 


This means: multiply the / in each cell by the u and v coordinates of 
that cell and add the results, proceeding from cell to cell over the 
whole table in any order whatsoever. But it may be more con- 
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venient to proceed in a definite order, say down the columns. Then 

(a) becomes 

(b) 22 2u»f(u, v) = 2 M 2 l '/( w > v). 

U V U V 

The expression J Z v f( u , v) in the right member of (b) means: for 

V 

any u ( i.e for any column), multiply each / by its own v and add 
the results. Let us denote this sum by V. Then the right member 
of ( b ) means: multiply the V for each column by the u of that 
column and add the results, proceeding from column to column 
(i.e., summing in the a-di recti on). We may also obtain the same 
result as in (a) by proceeding along the rows. Thus (a) may be 
written 

(c) 22 m *’/( u > ») == 2 !, 2«/(«> *>)' 

V U V u 

The expression ^2uf(u, v) means: for any v (i.e., for any row), 

u 

multiply eacli f in the row by its own u and add the results. If we 
call this sum V, then the right member of (c) moans : multiply 
the U for each row by the v for that row and add the results, pro- 
ceeding from row to row (i.e., summing in the ^-direction). 

We arc now ready to derive the formula for r. 

Since we are now dealing with a frequency distribution, the funda- 
mental definition of r becomes 


(26) 


r = 


- *)(y - y) 

™ -r, U 


O’ id y 


From (22) and (23), we have 


(x — x) = h(u — U), 


and from (24) and (25), 

(y - y) = k(v - v). 

Since (x, y) and (w, v) are merely different notations for the same 
cell, we have 

f(x, y) = f(u, v). 

For computing purposes, the standard deviations are defined as 



Sec. 13 


Computation of r 


197 


follows: 



Therefore, (26) becomes 


- u)(v - »)/(«, v ) 

M U, V 

r 

VuGv 

If now we let 

u = 2>/(u, i/) and K = 2>/(u, a), 

U V 

then since 

'E.uvfiu, v) = £"]£“/(“> u) = £tt£t>/(u, v), 

u,v DU U V 

the above expression for r may be written in either of the following 
ways: 



- 5,1 
(TttflTi; 

The fact that 

I>t/ = 

V u 

serves as a check in the table. 

The above procedure is illustrated in Table 35. 

Explanation: The table is self-explanatory except possibly the U and V entries. 
Recalling that U = v), the first entry in the U column is obtained from 

u 

the sum of the following products: 0-1 + 1-2 + 2-3 +3-1-11; the second 
entry from —1*1 + 0*3 + 1*8 + 2-1 + 3-5 = 24. Since V = ^jvf(u 9 v) the first 
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Table 35 — Computation of r for Data of Table 32 



F 

-3 

-2 

fl 

B 

B 


3 


vf(v) 

m 

v*/(v) 

u 

vU 

V 

gjj 

67 

72 

77 

82 

87 

92 

97 

J 

3 

92 

■ 

B 

B 

fl 

2 

3 

1 

7 

21 

63 

11 

33 

2 

87 

■ 

B 

1 

3 

8 

1 

5 

18 

36 

72 

24 

48 

1 

82 

B 

B 

6 

4 

9 

B 

B 

28 

28 

28 

-15 

-15 

0 

77 

3 

3 

7 

6 

B 

fl 

fl 

23 

fl 

B 

-18 

0 

-1 

72 

2 

3 

5 

6 

1 

1 

B 

18 

-18 

18 

-14 

14 

-2 

67 

3 

2 

B 

B 

fl 

B 

B 

5 

-10 

20 

fl 

26 

-3 

62 

B 

B 

B 

fl 

B 

B 

fl 

1 

-3 

9 

-3 

9 

/(») 

13 

12 

19 

20 

24 

6 

6 

100 

54 

210 


© 

«/(u) 

-39 

-24 

-19 

0 

24 

12 

18 

-28 

j? 

«*/(«) 

117 

48 

19 

0 

24 

24 

54 

286 

V 

-7 

-3 

3 

7 

30 

11 

13 


tiV 

21 

6 

-3 

0 

30 

22 

39 

© 


entry in the V row is obtained from 1-4 + 0-3 H 1-2 H 2*3 H 3*1 

Similarly, for the other entries. 

. Computations: 

«r«* = ^ £u*/(u) - u* = 2.86 - (-.28)* 

N 

= 2.7816. 

< 7 . = V2.7816 = 1.67. „ 

< 7 .* = 4l>/(«0 - »* = 2.10 - (.64)* 

N 

= 1.8084. 

. < 7 , - Vl.8084 - 1.34. 


Therefore from (29) we have 

1.15 - ( — .28) (.54) 
(1.67) (1.34) 


0.58. 


—7. 


r 
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14. Remarks on Computation of r. (a) Sign of r. It should be 
observed that the sign of r depends on the choice of the positive di- 
rection along each coordinate axis. In Table 35 the origin of refer- 
ence is chosen so that the data occur in the first quadrant and the 
directions on the (x, ?/)-axes are the conventional ones. These 
directions were preserved in changing to (//, v) coordinates. If we 
had reversed the direction of the a-axis by labeling the y values 
larger than y — 77 by v — —1, — 2, —3, and those less than y = 77 
by v = 1, 2, 3, the sign of r would be changed. But if the directions 
of both u and v were reversed, the sign of r would be unchanged. 

( b ) Grouping errors . When JV is small, say less than 100, and 
the data are grouped into cells, grouping errors are introduced. In 
general, the fewer cells used, the greater the errors. These may be 
corrected, in part, by applying Sheppard's corrections to <r u and 
<r v . However, this will not be insisted upon in this course. 

( c ) Commercial charts. Computations can be expedited by the 
use of commercially prepared correlation charts. Several types of 
chart are available on the market. In her book (reference 15), 
Professor Helen M. Walker explains the merits of two of these which 
are recommended. She also gives the following advice to beginners: 
“ A chart is not a crutch to help the novice. It is a means of speed- 
ing up operations after they are well understood.” 

Exercises 

1. By equation (29), show that r is independent of the choice of origin and of 

the units of measurement. 

2. In Table 35, evaluate the following sums: 

L/(". 2), £/( 2 , »), £uf(u, i), Z>/(- 2, v), »), 2Z »«»/(«, u) 

M V U V U V U f V 

77- Z/(“. *) if » = 0. 

3. Derive (29). 

4. For the table on page 200, find r and 5, y, <r x , a y . Note that x 0 , y 0f h 

and k , do not need to be determined to compute r, but are required 

for the means and standard deviation of x and y. 

15. Regression Lines for a Correlation Table. The data of a 
correlation table may be thought of as dots lying many deep at the 
centers of the several cells. There are, of course, /(x, y) of these in 
any cell whose coordinates are (x, y), and f(x) is the total number of 
dots in a vertical column whose coordinate is x. Suppose now we 
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Heights and Weights op 200 Freshmen 
(Heights to Nearest A Inch; Weights to Nearest l Pound) 


X 

90- 
99. £ 


i 

!H 

1 

n 

1 

- 

i 

i 

m 

I 

I 


- 200- 
209.5 

m 

76- 

77.9 




1 







1 


1 

74- 







1 

1 

1 

1 



4 

72- 




1 

1 

1 , 

4 


1 




8 

70- 



1 

2 

6 

B 

6 

2 

L 

2 

1 

1 

29 

68- 



2 

8 


D 

9 

2 

1 

i 

1 


49 

66- 


i 

8 

16 

14 

13 

6 

2 

1 



1 

61 

64- 


3 

8 

7 

i. 

7 

3 

3 

1 

1 




33 

62- 

1 

4 

1 

7 

1 

9 

a 



i 

■ 

S 

14 

60- 











■ 

■ 

0 

58- 

59.9 


1 




1 

1 



1 

i 

■ 

1 

/<*) 

1 

8 

20 

42 

46 

32 

29 

8 

6 

4 

2 

2 

200 


Ans, 2 = 138.45 lbs.; y = 67.82 in. 
<r* = 19.6 lbs,; <r v = 2.8 in. 
r = 0.48. 


replace all the data in each column by an equal number of data con- 
centrated at the mean of that column. If we denote the ordinate of 
this mean point by «/*, we have 

( 30 > = 7^j y ^- 

Hence, y z f(x) represents the totality of all the values in a column. 

For each of the columns there will be a value of (30). Taking the 
hypothesis that the mean points of the several col umns lie approxi- 
mately on a straight line. 5. = m,x + k, we may find mi and k under a 
least-squares criterion of approximation. If, in applying the criterion, 
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the square of the difference between the observed mean, 0 y x , and the 
computed mean, c y x , for each array, viz., (y x — m x x — k) 2 y is weighted 
with the number f(x) in the array, it turns out that we get the same 
values for m x and A* which we obtained when we fitted the regression 
line of y on x to the scatter diagram. 

In proving this, the student of calculus 1 would have an easy task 
in obtaining the normal equations: 

jL,(y x - mix - k)f(x) = 0 

* 

23(2/^ - - k)xf(x) = o 

X 

whose simultaneous solution yields the desired values of m\ and k . 
Expanding (31), we have 

2}i7J0«0 - niij^xfix) - kj^f{x) = 0 

X X X 

2jjW(a:) - rrii'£ l x*f{x) - kj^xf(x) = 0. 

X XX 

Since 

T,Vxf(x) = ££>/(£, y) = Ny, 

x x y 

and 

Hy~rf(x) = 'Z.x'Eyfix, y) = y), 

x xu x, y 

equation (32) becomes 

Ny — iitiNx — Nk = 0 

^ Y, x yf( x > y) - miZ2x 2 f(x) - kNx 0. 

x, u X 

Solving (33) for m , and k we find 
k = y — m ix 

T.xyfix, y) - Nxy 

__ x, v ' 1 &y 

mi ~ £x s /(x)‘ - Nx* ~ T z ’ 

X 

1 Differentiating partially £/Cr)(y* — m v x — k) 2 with respect to m and k 

respectively, and setting the results equal to zero, yields equation (31). Instead 
of differentiating this expression one may expand it, regard the result as 
a quadratic in both m and k y and use the theorem of $3, Chapter VII, to 
obtain (31). 
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and the equation of our line becomes 

( 7y <Tj i 

y x = r— z + y r2, 

(T* a* 

that is 

(8a) y x - y = r— (x - x). 

<r x 

Therefore, the best-fitting line for the means of the columns prop- 
erly weighted, and the best-fitting line for all the dots are one and 
the same straight line. But from the point of view of a correlation 
table, a regression line is to be regarded as the equation from which 
may be estimated the average of all the y } s associated with a particular 
value of x. In other words, a prediction in the latter case professes 
to give only the mean result (Figure 38). 



Fig. 38 — The Line of Regression of y on x is tiie best Fittino Line for 
the Means of the Columns 

16. Applications. The data of a correlation table arc usually re- 
garded as a sample of the much larger class of similar data consti- 
tuting the universe. A regression equation calculated from a limited 
but representative sample may give valuable estimates of the average 
values of y in the universe associated with designated values of x. 

Let us consider the data of Table 36 on page 203. Suppose a 
personnel manager in charge of hiring employees of a manufacturing 
plant has instituted a system of mental tests for applicants, and has 
gathered these data showing the relationship between the standing 
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made by applicants on their mental tests and their productive ability 
when measured according to a certain standard of production after 
they are hired. 


Table 36 


* 

X 

22.5 

27.5 

32.5 

37.5 

42.5 

47.5 

52.5 

[" 

■ 



m 

fl 


-2 

-1 

B 

B 

2 


f(v) 

x v 

22 

a 


■ 

■ 




2 

M 

7 

47.5 

nn 

3 


BB 



1 

4 

s 

MM 

m 

48.1 

ua 

MM 


| 

s 


8 

11 

H 

B 

D 

45.9 

m 

1 


MM 

MM 


m 

9 

8 

B 

i 

1F1E1 

|S5 

0 

1 

■1 

MM 

■a 

m 

12 


B 

1 

EE3 

m 

-1 

MM 

■9 

MM 

MM 

16 


b 

wm 

i 

KIM 

[« 

-2 

MM 

D 

5 

8 

8 


i 


Bcgl 

EB3 

m 

cm 

2 

M3 

m 

4 

1 

1 



m 

E3 


7 



49 


54 

35 

B 


■ 


h 

67.9 

72.1 

81.9 



19 

IIQ 

gg 

■ 



Here* j: represents t.ho grade made on mental test,, and y the per eent, of standard 
in production. (See also Table 27.) The means of columns are denoted by y x , 
and the means of rows x v . 

In order to demonstrate to the company’s management the con- 
nection between his mental tests and the productivity of the em- 
ployees he has hired, the personnel manager docs the following: 

(1) Computes the coefficient of correlation between the two series; 

(2) Shows what the estimated productivity of employees would be 
whose grades in the mental test fell on the mid-points of the class 
intervals of the mental test data. 

The means of the columns and of the rows are given in the table. 
In addition, he obtains the following results: 

x = 42.17, or, = 17.41, r = .417, 

y = 87.31, a x = 8.40, mi = r — .864. 

G z 

Therefore, the line of regression of y on x is 

y x - 87.31 = .864(a; - 42.17) 
or 


( 34 ) 


y x — .864x + 50.88. 
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This is the equation of the line that best fits the points which desig- 
nate the means of the columns (Figure 39). Hence, for^an assigned 
value of x , equation (34) gives the value of y which is the expected 
mean of the column defined by the assigned value of x. The personnel 
manager is thus prepared to predict the productivity of applicants 
on the basis of their mental test grades. In other words, the regres- 
sion equation calculated from the records of those already hired may 
be used in selecting from future applicants those most likely to 
succeed. 1 


y 

120 

IIO 

too 
90 
AO 
70 
60 
30 
40 
AO 
30 
IO 

3 # 0 13 20 25 30 33 40 43 30 33 60 X 

Fio.39 — Means of Columns and Line of Regression 
of y on x for Table 36 

Exercises 

1. Verify the value of r given for Table 36. 

2. Verify the means of the columns given *n Table 36. 

3. Using equation (34) show what the estimated productivity of employees 

in the factory referred to above would be whose mental test grades were 
22.5, 27.5, etc. 

4 . For Table 35, 

(a) Find the equations of the regression lines. 

i 1 The critical reader may doubt if the value r = .417 is sufficiently large to 
warrant much confidence in (34) as a predicting equation. The question of 
reliability of predictions is discussed later. 



■ 





■ 

■ 




a 



■ 





■ 

■ 




a 



■ 





■ 

a 




a 



■ 





■ 

a 




a 



■ 





a 

s 




a 



■ 



5 

t 

a 

a 




a 



Hj 





■ 

a 




a 



s 





■ 

a 




a 



■ 






a 




a 



■ 





■ 

a 




a 



■ 





■ 

a 




a 



■ 





■ 

a 



■ 

a 



■ 





■ 

a 



■ 

a 




Sec. 17 S u for a Correlation Table 205 

(b) Locate the axes through the mean of the table and graph the regression 
lines. 

(c) Compute S v . 

6. As in Exercise 4 for the table on page 200. 

Arts, to (a), 

y* = .069s + 58.3 
35 y = 3.362/ - 89.4. 

17. S v for a Correlation Table. For ungrouped data we have 
defined S y as a measure of the clustering of the data around the 
regression line, and have observed that it is called the standard error 
of estimate. In order to understand what S y has to do with “ esti- 
mates ” it is necessary first to consider its meaning in a correlation 
table. Let us denote by s y . x the standard error about the regression 
line in the array of y’ s at x. Thus we have 

(35) V* 2 = - ttLC# - cVxYJix, y ) 

where a y denotes an observed y value and c y x denotes the value 
obtained from the regression line for that column. Thus, for the 
column headed 32.5 in Table 3(5 we obtain the computed value 
y x by substituting x = 32.5 in (34) whence we find y x = 78.96. 
To evaluate s y . x 2 for this column we find the square of the deviation 
of each of the 32 values of 0 y from 78.9(5, add the results and divide 
by 32. Extracting the square root of the result we find s v . x = 15.96. 
Moving along the regression line suppose we have computed an s v , x 2 
for each array of y f s and averaged the results. It is interesting to 
learn that this average is S„ 2 . This is stated more precisely in the 
following theorem. 

Theorem V. The arithmetic mean of the values of s v . x 2 for the several 
columns when each s y . x 2 is weighted with the frequency in that column is 
S v 2 = <r* 2 (l - r 2 ). 

Proof: Using (35) we have 

y - SxYfix, y). 

Substituting the value given by (8a), §15, in the right member of the 
above identity we have 



206 


Correlation Theory 


vm 


that is 

-V) - r~Jx -S)|/(*,y) 

which reduces to <r v 2 ( 1 — r 2 ). It is left as an exercise for the student 
to show this. 

For Table 36 wc find S u = 15.83. In Figure 40 the parallel lines 

on either side of the regression 
line Kli' are drawn at a vertical 
distance of ±S y from it. They 
describe the average limits of 
scatter above and below the re- 
gression line. 

To connect 8 V with the reli- 
ability of predictions it is neces- 
sary to introduce the concept of 
a correlation surface. Indeed, 
a knowledge of the fundamental 
properties of a correlation sur- 
face is desirable for a wider outlook on correlation theory in general. 

18 . Normal Correlation Surface. A correlation table may bo 
idealized into a surface in somewhat the same way that a histogram 
is idealized into a frequency curve. The concept of a surface relates 
to the universe from which the observed data of the table may be 
regarded as a sample. Let the dimensions of the cells of a table be 
Ax and Ay , and suppose columns are erected upon these cells with 
altitudes proportional to the frequencies in the cells. The result is 
a sort of solid histogram. Then as Ax — > 0, Ay — ■ » 0, N — > «> , the 
tops of the columns approach as a limit a smooth surface which is 
called a correlation surface. Our discussion will be confined to the 
case where we may assume that this limit is a normal correlation 
surface. In discussing this surface it is convenient to let x and y 
represent deviations from the respective means and to let z = f(x, y) 
denote the frequency function representing the surface. Such a 
surface is shown in Figure 41. 

Any section of this surface parallel to the i/z-plane is a normal 
curve and represents the distribution in a column at x . Similarly 
any section parallel to the az-plane representing a row is a normal 
curve. The frequency in a cell is measured by that portion of the 
volume under the surface which lies over that cell. All those cells 
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in which the frequency is a fixed value lie on an ellipse. That is, if 
contour lines are drawn on the surface joining the points of equal 
height above the base they will be ellipses. In other words, sections 
of the surface parallel to the xy-plane are ellipses. 



We will digress here for a brief discussion of an ellipse. We liiay 
think of an ellipse as a transitional figure between a circle and a 
straight line, as the circle flattens out. That is to say, the limiting 
form of an ellipse is a circle at 
one extreme of the flattening 
process and a straight line seg- 
ment at the other extreme. 

The degree of flatness is called 
the eccentricity of the ellipse, 
and it is proved in analytic 
geometry that the eccentricity 
varies from zero in the case of 
a circle to unity when the ellipse 
degenerates into a line. All 
ellipses having the same eccentricity whatever their size have the 
same relative proportions and are therefore similar in form. 

The eccentricity of the elliptical contours of different normal cor- 
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relation surfaces varies with the amount of correlation existing in 
the corresponding universe. A surface with narrow elliptical con- 
tours represents a universe in which there is high correlation, whereas 
if the variables are completely independent in the probability sense 
the contour lines are circles when the variables are expressed in 
standard units. If the variables are not expressed in standard units 
(and r = 0) then the contour lines may be ellipses but their major 
and minor axes will coincide with the x - and y - axes as in Figure 42. 
When r t^O the axes of the ellipses make an angle with the xy- axes, 
their major axis cuts quadrants I and III in the zy-plane if r > 0 (as 
in Figure 41) and quadrants II and IV if r < 0. 

19. Properties of Normal Bivariate Surface. The equation of a 
normal correlation surface is given by 

(36) f(x, y) = Ke~ p 

where 

= 1 fa? _ 2 rxy y*\ 

2(i -r 2 )W 

K = N -f- (2ira x a y \ // 1 — r 2 ), and x and y represent the correlated 
variables referred to their respective means as origin. 

By means of (30) an observed distribution may be fitted with the 
appropriate normal surface assuming that the sample might reason- 
ably have come from such a universe. This is accomplished by 
replacing <r*, a v , r , and N in (36) by the corresponding statistics 
calculated from the sample and taking the origin at the mean of the 
table. Let us assume that an observed distribution has been gradu- 
ated by such a surface and the theoretical cell frequencies obtained. 
The surface extends to infinity in the x?/-plane but contour ellipses 
can be obtained which will enclose any desired percentage of the 
given frequency when these ellipses are projected orthogonally onto 
the xy-plane. They are all concentric, similar, and similarly placed. 
Figure 43 represents such an ellipse?, say the smallest one necessary 
to enclose all the given cells. The systems of perpendicular chords 
represent the columns and rows of the table. 

The graduated frequencies for each column are normal distri- 
butions whose means lie on the regression line of y on x and whose 
standard deviations are in each case given by S v = <r„(l — r 2 ) 1/2 . 
To state the same thing in a slightly different way, an array of y ’ s 
corresponding to a fixed value x\ of x is a normal^ distribution whose 



Sec. 20 


Reliability of Predictions 


209 


mean deviates from y by t(<t v /(t x )xi and whose standard deviation is 
S v = ff y (l — r 2 ) 1/2 which is independent of x\ and therefore is the 
same for all such arrays. Similarly an array of z's corresponding to 
a particular value y\ of y is a normal distribution with a mean which 
deviates from x by r{<r x /a v )y h and a standard deviation of S z = 
<r x (l — r 2 ) 1/2 which is independent of y \ and therefore is the same 
for all such arrays. A careful study of Figure 41 will help in under- 
standing what is meant by these statements. 

When the means y x of the columns fall exactly on the regression 
line, s y . z becomes the standard deviation of a column and is therefore 
the same as S v . Theorem V states 
that S v 2 is an average of the values 
of Sy. x 2 but when all the quantities 
being averaged have the same 
value, as they do in the ideal case 
of the normal surface, their (mean) 
average is that value. When the 
standard deviations of the columns 
are equal, the regression system 
of y on x is called a homoscedaslic 
system. In a universe where they 
are not equal the system is said to 
be heteroscedastic . For a homo- 
scedastic system with linear regression, S y = <r y (l — r 2 ) 1/2 is the 
standard deviation of each array of y’ s. 

20. Reliability of Predictions. In using a regression equation to 
make predictions we are naturally interested in the degree of con- 
fidence to be expected in the predictions thus made. The use of S v 
in this connection is based upon the properties of the normal cor- 
relation surface. 

Let us imagine the universe of which Table 30 is a sample and 
assume that it may be described by a normal surface. Confining 
our attention to a section parallel to the 2 / 2 -plane in Figure 41 we 
know that an x array of z/’s is distributed normally about a value of 
y determined by a designated value of x in the regression equation 
of y on x . That is, the mean of this normal distribution is the 
predicted value of y and its standard deviation is S y . The per- 
centage distribution of such an array is the same as that given in 
Figure 23 of Chapter VI, if S v is taken as the unit of measurement 
along the horizontal axis. But an estimate of S y is its value cal- 
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culated from the sample. Moreover, for an observed distribution, 
we have seen that S v is the average standard deviation of the several 
columns and therefore it may reasonably be taken as an approxi- 
mation to the theoretical S v which in the universe is the same for 
all the columns. We also take the calculated regression equation 
as an approximation to the theoretical. 

By measuring deviations from the predicted value in terms of 
S v in the same way that a is used as a unit in measuring deviations 
from the mean, we may then enter a normal probability scale for 

the probability of a deviation 
involving multiples of S v . Ac- 
cording to this scale the prob- 
ability P v is about .68 for a 
deviation of ±S V from the pre- 
dicted value, and the chances 
are even for a deviation of 
.67455^ on either side of the 
predicted value. 

For Table 36 we have found 
S v = 15.83 and for an applicant 
making x = 32.5 on the mental 
test we have predicted y — 
78.96. Therefore the chances 
are about 68 in 100 that his 
percentage of productivity will 
be between 78.96 — 15.83 and 
78.96 + 15.83, that is, between 63.13 and 94.79. In other words, 
the probability is about .68 that the predicted value will not be in 
error by more than 15.83. 

To summarize, in a normal bivariate universe each array is a 
normal distribution and therefore its mean coincides with its mode. 
Since regression is linear, a value predicted from the regression equa- 
tion of y on x is the mean value of y for a designated value of x. 

Then, P v = is the probability for a deviation from the 

predicted value of y x as small as |/| where i is expressed in units of the 
standard error S v of a column. Thus, 



Fig. 44 — Representing an x Array op 
j / s and Deviations of ±S v from a 
Predicted Value of y 


v-J* 

s v 
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Then 1 — P v is the probability for a deviation as large as |<|. 
larly, when dealing with the regression line of x on y, P x = 


Simi- 



the probability for a deviation from the predicted value as small 
as |t|, where now t = (x — T v )/S x . 


Exercises 

1. Refer to problem 4, §4. Assume that the data given there are obtained 

from a correlation table which is a representative sample from a normal 
bivariate universe describing the heights and weights of senior men stu- 
dents in colleges and universities of the United States. Then a value 
predicted from the regression equation of y on x will give the mean of the 
11 column ” at x. Similarly, for an assigned ?/, the corresponding x in the 
regression equation of x on y will be the mean (if the “ row ” at y. Under 
this assumption, determine the probability that Doe's height is outside 
the interval 65.75 — 77.75 inches. What are the chances that ltoe will 
be between 100.8 and 122.4 pounds in weight? 

Ans. 1 — P v = .0027, P x = .5 (approximately). 

2 . Discuss the reliability of the predictions which you made in Exercise 3, §16. 

Outline of Solution. Suppose a reliability level of P v = .5 is desired. Mak- 
ing the necessary assumptions, this allows a deviation of t = ±.6745. 
Since S y = 15.83 we have 

d 


where d - y — y x . That is, y = y x ± ? For x - 37.5, y = ? ± ? 
So the probability is .5 that the standard of production will be between 
what limits for a person making x — 37.5 on the mental test? The 
problem is analogous for any other designated value of P v and for other 
assigned values of x. 



3. Consider the surface represented by (36). Prove that a section of the sur- 
face parallel to t.he yz coordinate plane is a normal curve with its mean 
on the regression line of y on x and with variance S y 2 = <r„ 2 (l — r*). 
Outline of Solution . Write (36) in the form 

(а) / - Ke~ p , 

where P = (w* — 2 ruv + v*)/2(l — r 2 ), u = x/ V,, v = y/<r Vt 2 = 0 =■ y. 
The trace of the surface in the plane u = u\ is determined by substituting 
u\ for u in (a). This substitution yields the result 

(б) J = Ce~ T 
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where T = (v — ru t ) , /2(l — r 1 ), C — Ke~ u ‘ , < t . Upon returning to (x, y) 

coordinates, (b) becomes 

(c) / = 

where m = rxi<j v /i r x , h 2 = l/(2£„ 2 ), & y 2 = <r y 2 (l — r 2 ). 

21. Non-Linear Regression. Correlation Ratio. We have seen 
that the regression systems of a normal correlation surface are linear. 
In a correlation table which is a representative sample from a normal 
bivariate universe the means of the arrays would lie approximately 
on straight lines. But in correlation tables which are samples of 
other types of universes, regression might not be linear. Moreover, 
one of the regression curves might be strictly linear and the other 
non-linear. The following numerical example illustrates the latter 
possibility. 
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In this example, the regression of y on x is linear whereas that of x on 
y is non-linear. 

When the means of the columns (or of the rows) do not lie approx- 
imately on a straight line, the use of r may be misleading because 
t = 0 indicates absence of linear correlation only and not necessarily 
absence of correlation in general. 

One of the best treatments of this situation is that given in the 
Cams Monograph on Mathematical Statistics, which will be repro- 
duced substantially here. 

In introducing a correlation ratio f yj yx , (eta) of y on x, as an appropriate measure 
of correlation to take the place of the correlation coefficient in such a situation, 
wc may get suggestions as to what is appropriate by solving for r in (10). This 
gives 

(37) r* = 1 — — ♦ 

\ 

where we may recall that S v 2 is the mean square of deviations from the line of 
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regression. Then 


[*-sr 


This formula could be used appropriately as a definition of r in place of our 
definition in (1), and its examination may throw further light on the significance 
of r. When S y = 0, the formula gives r = rbl and, as we have seen earlier, 
all the dots of the scatter diagram must then fall exactly on the line of regression. 
When S y = <r Vf the formula gives r = 0, and the regression line is in this case of 
no aid in predicting the value of y from assigned values of x. In the formula 
r* = 1 — S y 2 /a y 2 it is important to keep in mind that the mean square deviation 
S v 2 is from the line of regression. Next, let S y 2 be the corresponding mean 
square of deviations from the means of columns. Then = S y 2 when the 
regression is strictly linear, but 8 y ' 2 ^ S y 2 when the regression is non-linear. 
This fact suggests the use of a formula closely related to |1 — iS y 2 /<r v 2 ] 112 for a 
measure of non-linear regression by replacing S u by S/. We than write 

O '2 

(38) Vux 2 = 1 -- "V 

o v 2 


where -q yx is the correlation ratio of y on x, and S y ' 2 is the moan square of devia- 
tions from the means of the columns whether these means are near to or far 
from the line of regression. 

In general, we may say that the correlation ratio of y on x is a measure of the 
clustering of dots about the means of columns. 

An analogous discussion for the? rows obviously leads to 


giving rj xy 2 , the square of the correlation ratio of x on y. 

That n yx 2 ^ 1 and that the equality holds only when all the dots in each 
column are at the mean of the column follows at once from (38). 

That rj yx 2 ^ r 2 may be shown by recalling the meanings of S y 2 in (37) and 
of S y ' 2 in (38). A mean square of deviations in each column is a minimum when 
the deviations are taken from the mean of the array. Hence, the & T / 2 in (38) 
must he equal to or less than S y 2 in (37) for the same data, since the deviations 
in (37) are measured from the line of regression. Hence, we have shown that 

1 ^ v„ 2 > r*. 

Moreover, when the regression of y on x is linear, rj yx 2 — r 2 found from the sample 
differs from zero by an amount not greater than the fluctuations due to randojn 
sampling. Hence, rjyx 2 — r* becomes a criterion for testing the linearity of 
the regression of y on x. 

For computational purposes, it is desirable to express the correlation ratios 
in a form involving the standard deviations of the means of arrays. For this 
purpose, Jet y x be .the mean of any column of y’s and cty z the standard deviation 
of the means of columns when the square ( y x — y) 2 of each deviation is weighted 
with the number f(x ) in the column. Then it follows that 

„ 2 _ C '2 __ 2 

Oy Cyj. 
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That is, the correlation ratio of y on x is the ratio of the standard deviation of 
the means of columns to the standard deviation of all y* a. 1 

To prove (39) we must show that <r y 2 — S v n = <rg 2 . * We begin 
by observing that the concentration of the dots in a column About 
their mean may be measured in terms of their standard deviation. 
Let <t v . x denote the standard deviation of the y’ s in the column at x. 
That is, 

(40) <r v . x 2 = -TJ-T T,(y - VxYfiz, y). 

J W V 

Now, the concentration of the dots in the entire table about the 
means of the columns may be measured by finding the mean value 
of all such expressions oy x 2 for all the columns of the table. But 
since there are more points in some columns than in others, it will be 
desirable to weight the oy x 2 for each column by multiplying it by 
the number of points or dots in the column. It is this weighted 
mean value of the <r y . x 2, s which we have denoted by S v ' 2 . That is, 

(41) £,'*-£ 2/ (*)'*-’• 

In order to verify (39) we must now show that 

(Ty 2 = Sy 2 + <Ty z “. 

Adapting (14) of §9, Chapter V, to the notation of this chapter, 
we have 

(42) N*S = ZfixWS + 2 :/(*)(*. - yy. 

X £ 

This follows from the fact that N is composed of the several sub- 
distributions/^) in the columns, and <r„. x is the standard deviation 
of a column about its mean y z . It is obvious that 

jiH f (?)($• - $y 

gives the variance <Ty x 2 of the means *of the columns. The above 
expression (42) then becomes 

N<Jy 2 = NSy ' 2 + Nv-y 2 , 

which reduces to <r v 2 — S v ' 2 — <r s 2 y and hence we obtain (39). 

22. Computation of ij 2 . It should be instructive to compute 
ijyz 2 for Table 36, by both relations (38) and (39). 

1 Rietz, Carus Monograph on Mathematical Statistics , p. 89 el seq . 
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For (38) we have the following: 


Vvx 2 



S '2 

Om 




<Ty. X 


i 

f 


a 


i 

/(■0 


Z(y - 


£*)*/(*, y)- 


^•x J /(x) 

106.12* 7 

191.83 14 

246.48 32 

283.63 49 

257.65 55 

294.51 54 

222.53 35 

71.43 14 


S/* = 246.45 
<r v s = (17.4 1) 2 = 303.11 
2 _ . 246.45 

^ “ 1 303.11 

= .1869. 


For (39) we have the following: 


„ <*», 2 2 

■ = » <Ts 

tv 2 * 

Vy 


y 

= 87.31. 

Vx 

M 

67.86 

7 

72.14 

14 

81.87 

32 

84.80 

49 

85.73 

55 

90.92 

54 

95.57 

35 

105.00 

14 

(TjjJ 1 = 56.66 (see 

Exercise 3, 

* _ 58 66 


^ ~ 303.11 


= .1869. 



- arm, 


p. 97). 


> See Table 27 and Table 16. 
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In verifying (39) for this example we have <r„ J — Sy 1 = 303.11 — 
246.45 = 56.66 and a ix 2 = 56.66. 

The above illustrations are useful in giving an understanding of 
the meaning of v yx 2 . However, for computational purposes, another 
formula may be derived which involves less labor than either (38) 
or (39). In fact, the computation of a correlation ratio may be very 
conveniently performed by an easy extension of a correlation table. 
The derivation of the appropriate formula will now be given. 

The standard deviation (a B J of the means of the columns may be 
expressed in the (u, v) units by the relation a Bx — k 2 ai 2 

where o* 2 — ^53/(tt)t>« 2 ~ 52 


which is the definition of the standard deviation of the variable v u . 
This is apparent if we observe that the mean for the whole table in 
the ^-direction (i)) is the mean of the quantities v u for the several 
columas. 1 
Since 


we have 


K = 


1 

/(«) 


I>/(w, v) = 


V 

/(«) ’ 


* 1 V- 

a?/(«) *' 


Recalling that <r y 2 = kPof, we have 


that is, 

( 43 ) 


2 G ~ v * 

Vvx = — = 
<r v 2 


W = ^‘2 

Uv 


Is? 


V* 

/(«) 


— I ! 2 


k 2 a v 


N^flu) } 


An analogous discussion for the rows of x’s leads to 


2 _i_|l v 0* 

■*1*V ~ _ 2 I 


U 2 J 1 


(44) w <r» s i Nrm 

giving the square of the correlation ratio of x on y. 

' 7; Z/(«)8. = 7; L/(“) 777 !>/(«, «) = 7; ») = !>/<«, - *• 

N „ N » /(“) » i* » « - 



Sec. 23 Test for Linearity of Regression 217 

Example . Find rj vx 2 for Table 35. Solution: Referring to this table and 
using (43) we obtain the following results: 


F* 

49 

9 

9 

49 

900 

121 

169 

Sum 



.75 

.47 

2.45 

37.50 

20.17 

28.17 

93.29 


v 2 - .2916, a v 2 - 1.8084, N = 100. 

’»'-rss[i5o (83 2#) - H . 

rjyx 2 — .3546. 

It may be well to mention that the value of rj is not independent 
of the classification of the data. As the class intervals become 
narrower, y approaches unity. This may be understood from (38). 
If the grouping were so fine that only one item appeared in each 
column, then it would constitute the mean of that column. In this 
case Sy would be zero and y would therefore be unify. On the other 
hand, a very coarse grouping tends to make the value of y approach r. 
“ Student ” has given a formula for The Correction to be Made in the 
Correlation Ratio for Grouping in Biometrika, vol. IX, pp. 316-320. 

23. Further Discussion. Test for Linearity of Regression. Let 
us consider the totality of mean points (x, y x ) of the columns and 
think of a curve connecting them. Of course, for a table of observed 
data, it is possible to draw many such curves. In order to show 
clearly why a comparison of y 2 and r 2 is the basis of a test for linearity 
of regression, it will be necessary to consider a theoretical table in 
which there is only one such curve. When we speak of the regression 
curve we are thinking, not of the given table in which the dimensions 
of the cells are h and k , but of an ideal table in which there is an 
infinity of cells of zero dimensions. To put it another way, consider 
a sample of N pairs of values (x,-, yf) from which a correlation table 
is made with cells whose dimensions are h and k. If parallelepipeds 
are erected on the cells with heights proportional to the frequencies, 
the result is a solid histogram bounded by a broken surface. As 
h — ► 0, k — * 0, and N — > oo , this histogram will approach some solid, 
bounded by a’smooth surface. An example of such a surface is the 
normal correlation surface. In such an ideal table, it is possible to 
have but one curve connecting the means of the columns. This 
curve is sometimes styled the true regression curve of y on x. In an 
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analogous way for the means of the rows there would be a true 
regression curve of x on y. It is one of these curves that we have in 
mind when we speak of “ the regression curve ” or “ the regression.” 
For a normal bivariate universe (represented by a normal correlation 
surface), regression is linear. But for other types of bivariate 
universes (which might be represented by skew surfaces), it is 
conceivable that regression might be parabolic or exponential or 
some other type of curve. In such types, regression is said to be 

non-linear. The curve which is chosen to 
approximate the true regression curve must 
not be confused with the true regression 
curve. The latter notion relates to the 
ideal universe from which the data at hand 
are a sample. It is defined as the locus 
of the mean points of the columns of the 
theoretical table. When we fit a curve to 
the means of the columns of an observed 
table, this regression curve is merely an 
approximation to the ideal set up in the definition. Similar state- 
ments may be made about the regression of x on y. 

We will now recapitulate the expressions used in the comparative 
analysis of r 2 and rjvx 2 for an observed table. 



(45) 


(46) 


= 7rrS(y - oVxYfix, y) 

J y 

• Sy s = -jj 

2 . ‘V s V 

Vvx 2 =1 2 = — 

l, °V °V 

' l 

s „. x 2 = 77-r Z(y - c y*)V(x, y ) 

f\ x ) y p 

• Sy* = ^ T.Sy.^X) 


Recall that <r v .* 2 is defined as the variance in a column and therefore 
as the square of the standard error about the regression curve, what- 
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ever it may be, which goes through the means of the columns. S y ' 2 
is an average of the <r„. x 2 values, and rj yx 2 is defined in terms of S y . 
Correspondingly, s y . x 2 is the square of the standard error in a column 
about the line which best fits the means of the columns. S v 2 is an 
average of the $ y . x 2 values, and r 2 is defined in terms of S v . If 
regression is linear, the means of the columns will fall on the “ bast- 
fitting line ” and <r y . x 2 becomes the same as s v . x 2 . Then S y 2 = £ y 2 , 
and hence t ) yx 2 = r 2 . 

It is interesting to observe that <r v . x 2 is the second moment about 
the mean, for. an array of y’ s, i.e ., for a column. In the notation 
of moments it could be denoted by /z 2 : w .x. In this notation, s v . x 2 
could be denoted by v 2:v .x, being the second moment in an array 
of y’s about a point other than its mean. Since /* 2 < v 2f it follows 
that cry. 2 < s v . x 2 . Therefore S y ' 2 < S y 2 and tj vx 2 > r 2 . If each y 
value of a column is at the mean of that column then it is 
obvious that <r y . x 2 will be zero. In this case, S y = 0, and rj yx 2 ■= 1. 
On the other hand, for any column, the contribution of <r y . x 2 f{x) to 
Sy' 2 cannot exceed its contribution to <r y 2 . Taking the weighted 
mean of the respective contributions over all the columns, we have 
Sy 2 < <r y 2 and hence 

Vv 2 < 1 . 


Writing (38) in the form 

S y ' = *y( 1 - Vyz 2 ) 112 

we see that S y is a measure of dispersion about the regression curve 
(which is the locus of the means) corresponding to S y = <r„(l — r 2 ) 1/2 
which is the standard error about the “ best ” line . If r 2 = 1, then 
y is related to x by a linear function. If rt vx 2 = 1, it follows that y 
is a single-valued function of x. On the other hand, if r 2 = 0, it does 
not necessarily follow that there is no relation 1 between y and x. If 
rj yx 2 = 0 then r 2 = 0, but if r 2 = 0 it does not necessarily follow that 

Vyz 2 = 0 . 

In the ideal table, regression of y on x is linear if and only if 
y yx 2 — r 2 = 0. But in the case of an observed table, allowance must 
be made for sampling fluctuations. A corresponding analysis could 
be made for r T and y xy 2 y and y xy 2 — r 2 computed from the sample should 

1 See H. L. Rietz, “ On Functional Relations for which the Coefficient of Corre- 
lation is Zero.” Journal American Statistical Association , vol. 16, 1919, pp. 472- 
76. 



220 


Correlation Theory 


vm 


differ from zero by an amount not greater than the fluctuations due 
to chance, if regression of x on y Is linear. The question, naturally 
arises, what discrepancy between the computed values of 7j 2 and r 2 
may be tolerated before wc conclude that regression is non-linear? 
This problem has been investigated, and Blakeman 1 has proposed a 
testing formula. If certain assumptions are made, a simple though 
approximate test may be deduced from Blakeman\s formula. Ac- 
cording to this approximate test if 

(47) N (tj 2 - r 2 ) < 11.4 

then linear regression may be assumed. Since there are two rj 27 s there 
are two tests. It is possible for one of the regression curves to be 
linear and the other not. 

Evaluating (47) for Table 35 we obtain 100 [.3546 — (.58) 2 ] = 1.82, 
so the regression of y on x may be assumed to be linear. 

R. A. Fisher has shown that the Blakeman test is not very reliable. 
One can easily construct an example for which regression is obviously 
non-linear yet which satisfies the criterion (47). Consider the fol- 
lowing table: 


SB 






bS 

1 

2 

3 

4 

5 

3 

0 

H 

n 

0 

0 

2 

0 

■ 

■ 

1 

0 

1 

1 

0 

0 

0 

1 


Here, N = 5, ^xy = 27, x = 3, y = 9/5. From (3), therefore, 
r = 0 . From (40) and (41), S v ' = 0 and rj yx = 1. Applying ( 47 ), 
Blakeman’s test yields a verdict of linear regression of y on x, It 
appears that Blakeman's criterion is t)f doubtful utility. A more 
efficient method of testing linearity of regression is given in Part II. 

Exercises 

L Using (43) and (44) find v v ** and rj £V 9 for the table referred to in Exercise 4, 
page 221. Apply the test (47) and state your opinion about the linearity 
of regressions. 

1 See Handbook of Mathematical Statistics , Rietz and others, p. 131. 
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2. In the following table, x = Interest Rates, 4-6 months Commercial Paper; 
y = Total Bills Discounted by Federal Reserve Banks (1923-1932). Find 
r and rf yx 2 . Form an opinion about linearity of regression of y on x. (Data 
from Elements of Statistics , Davis and Nelson, page 288. ) 


Class 

Marks 

V 











7 







1 

6 

6 

6 

6 





1 

2 

3 

4 



5 





1 

3 

1 

2 



4 

- 




2 


9 

4 

1 



3 


1 

2 

1 

4 

9 

4 




2 


1 



11 

5 

i 




1 

4 


2 

3 

3 

1 





0 

2 

3 

3 

5 

3 






Class 











Marks 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

X 












3. In §44, Statistical Methods for Research Workers , R. A. Fisher writes: “ The 

sum of the squares of the deviations of all the values of y from their gen- 
eral mean may be broken up into two parts, one representing the sum of 
the squares of the deviations of the means of the arrays from the general 
mean, each multiplied by the number in the array, while the second is the 
sum of the squares of the deviations of each observation from the mean of 
the array in which it occurs.” [Compare with our (14a) of Chapter V.] 
Prove Fisher’s statement. Hint. In symbols, you are to prove that 

V = Vi + Vi 

where 

r = £( y-m(*,v) 

r. v 

vi »• £ ( 5 , - v)*f(x) 

X 

= £ (y - ?.)*/(*> y)- 

X, v 

4. Prove that i is the ratio between v x and V as defined in Exercise 3. 
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5. The mortality experience during the early years of an insurance company 
presents an interesting study in correlation. The following table shows 
for male lives the correlation between the ages (x) of th£ insured at issue 
of policy and his age ( y ) at death. Data of the Midland Life Insurance 
Company, 1 1906-1924. 



Find r, the two rj r s, and the equations of the lines of regression. 

24. Correlation from Ranks. Before defining rank we will find the 
variance of the difference, z, between corresponding values of two 
variables. Let x and y denote corresponding values of two series each 
consisting of N variates. Form a third series z where z» = Xi — y i% 
Then the mean of z is given by z = x — y and the standard devia- 
tion of z is, by definition, 

c * = jy L * 2 - 

1 From a paper On Certain Applications of Mathematical Statistics to Actuarial 
Data in The Record, American Institute of Actuaries, vol. XIII, Part II, 
No. 28, November, 1924. * 
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Replacing z by its equal x — y, we have 

a. 2 =^£(x 2 - 2xy + j/ 2 ) - f 2 - 5 2 + 22£ 

= {^2> 2 ~ 5?} - 2^ 2> - Jyj + £>* ~ 5*J • 

Whence 


(48) ay 5 = <r* 2 — 2r<r J <r„ + o-y 2 . 


If the variables x and y are uncorrelated, we have as a special case 

o z 2 = <T x l + <r v 2 . 

Solving (48) for r, we obtain 


(49) 


<Tx 2 + <Ty 2 — oy* 
2tr xff „ 


This is another expression for the correlation coefficient and involves 
standard deviations only. In particular, it may be used to advantage 
when x and y denote ranks, where by rank we mean order of magni- 
tude or importance. That is, rank refers to the position of a variate 
in an arrangement. 

If x and y denote the ranks of the same item with respect to two 
characteristics, and no ranks are omitted, and there are no duplica- 
tions of ranks, then both x and y refer to the integers from 1 to N. 

Therefore, x = y, and a* 2 = —j (N 2 — 1) = <r v 2 . See Theorem VI, 

Chapter V. Moreover, 



* _ g2 


= - 2 /)* - (* - V)* 

- ^ £(* ~ 2/)*> since 2 — 5 = 0. 


Let R denote the correlation coefficient when x and y refer to ranks 
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rather than variates. 

R = 

which simplifies into 
(50) 


Then (49) becomes 

•m 


R = 1 - 


6l> - y) 2 


JV(iV 2 - 1) 

This is known as Spearman’ s formula for rank correlation. 

If two or more variates are tied it is customary to divide the 
corresponding rank numbers among the variates concerned, using 
fractions if necessary. 

Example. Suppose we have the following scores made in two tests, arranged 
in the order of their rank. Find the correlation between ranks. 


Indi - 

1st Subject 

2nd Subject 

* - y 

(z - »)* 

vidual 

Score 

Rank = x 

Score 

Rank = y 

A 

92 

1 

85 

2 

-l 

1 

B 

86 

2 

76 

4 

-2 

4 

C 

84 

3 

93 

1 

1 

4 

D 

78 

4 

68 

6 

B 

4 

E 

71 

5 

67 

7 

-2 

4 

F 

69 

6 

83 

3 

3 

9 

G 

66 

7 

54 

9 

—2 

4 

H 

58 

8 

7<i 

5 

3 

9 

I 

53 

9 

43 

10 

-1 

1 

J 

45 

10 

59 

8 

2 

4 

* 

II 

o 





Total 

44 


We find R 1 = .733. 

10(99) 



















































Sec. 25 Interpretation. Common Elements 


225 


Exercises 

1. Suppose + How would this change formulas (48) and (49)? 

8. Twelve salesmen are ranked in order of merit for efficiency by their manager. 
They are also ranked in accordance with their length of service. What 
indication is there of a relation between length of service and efficiency? 


(Garrett.) 



Order of 


Years of 

Order of Merit 

Merit. 

Salesmen 

Service 

(Service) 

( Effic .) 

A 

5 

7.5 

6 

B 

2 

11.5 

12 

C 

10 

2 

1 

D 

8 

4 

9 

E 

6 

6 

8 

F 

4 

9 

5 

G 

12 

1 

2 

II 

2 

11.5 

10 

I 

7 

5 

3 

J 

5 

7.5 

7 

K 

9 

3 

4 

L 

3 

10 

11 

The fractions in the 

third column denote tics in rank. 

Thus, A and J each 


served 5 years and each is ranked 7.5. The next individual is ranked 9. 
Ans. It = .80. 


■. Find R for the following data: 

Rank Score 

Rank 

Score 

A 

1 

92 

2 

88 

B 

2 

89 

4 

85 

C 

3 

87 

1 

93 

D 

4 

86 

6 

79 

E 

5 

83 

7 

70 

F 

6 

77 

3 

87 

G 

7 

71 

9 

52 

H 

8 

62 

5 

84 

I 

9 

53 

10 

41 

J 

Am. 

10 

R = .733. 

40 

8 

64 

25. Interpretation. 

Common Elements. 

Although statistical 


theory gives a description of the indicated relationship between two 
related variables, the interpretation of the results “ abound in pitfalls 
easily overlooked by the unwary, while they are cantering gaily 


along upon their arithmetic.” 
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The methodological side has been developed until we can find correlation coeffi- 
cients by simply turning a crank, but the explanation of the meaning of the result 
after we find it, needs a brain. ... No amount of mathematical training and 
ability can take the place of the judgment and common sense that comes from 
a knowledge of the field in which the problem lies. 1 

In the interpretation of r one should avoid imputing any causal 
relationship between the variables. In this connection the following 
pungent remarks of Professor E. B. Wilson 2 may be appropriately 
quoted: 

« 

Correlation is a mutual affair between two numerical variables; the correlation 
coefficient r is symmetrical with respHict. to them. Strictly, y is not correlated 
with x or x with y, but x and y are correlated. Theory is very important in 
indicating what facts should be looked for as significant; facts are significant 
or important largely as they indicate theory, but neither compels the other, as 
the histories of theorizing and of fact finding amply demonstrate . . . Further, 
the value of the correlation coefficient depends on the group for which it is deter- 
mined or on the universe? of which that group is a fair sample. The correlation 
coefficient r of height and weight for a group containing humans from infancy to 
adult life would be different from, and in fact greater than, the coefficient for 
college students or for the members of a football squad; there is no such thing 
as the correlation coefficient per so. 

If the student has mastered the underlying mathematical theory 
he should be able to understand and profit by the interpretations 
given by the writers in his particular field of interest. As a final 
aid in forming a conception of its meaning, we state a theorem which 
gives to r a meaning in pure chance. If x and y are affected by s 
equally likely causes of which t are common to both, then r = t/s. 

Theorem VI- An urn containing white and black balls is so main- 
tained that in drawing a ball the probability of getting a white ball is a 
constant p and that of getting a black ball is q (= 1 — p). The first 
drawing of a pair of drawings is to consist of s balls taken one at a time 
from the urn . The second drawing is to consist of s balls of which t are 
taken at random from the s first drawp, and s — tare drawn one at a time 
from the urn . Then the correlation coefficient between the numbers of 
white balls in the two drawings is t/s. 

As an illustration of the theorem we will take s = 5, t = 3, p ~ 

Let x be the number of white balls in the first drawing and y the 

1 Crathome in Journal of the American Statistical Association, vol. 26, 
Supplement, March, 1931, p. 27. 

* Correlation and Association , Journal of the American Statistical Association, 
vol. 26 (1931), pp. 250-256. 
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number of white balls in the second drawing. Then Table 37, 
constructed by the theory of probability, 1 exhibits the a priori fre- 
quencies when we use as small numbers as possible for frequencies 
subject to the condition that each frequency is to be an integer. 

Table 37 — A Priori Frequencies 


[« 

■ 

1 

2 

3 

B 


f(y) 

5 

0 

0 

0 

9 

H 

■ 

16 

4 

0 

0 

81 

108 

B 

6 

240 

3 

■ 

243 

648 

432 

108 

9 

1440 

2 

H 

1620 


648 

SI 

0 

4320 

1 

1458 

' 3159 


243 

0 

0 

6480 

0 

2187 

1458 

243 

0 

0 

0 

3 888] 

fix) 

3888 

Ha 


1440 

240 

16 

16,384 


According to the theorem the correlation coefficient should be 
It is left as an exercise for the student to show, by computing r from 
the table, that this is actually the case. 

Review Questions and Problems 

1. Define the following terms: statistics, variate, discrete, class interval, class 

mark, z-array of y’s, range, regression line, sample, universe, coefficient of 
variation, variance. 

2 . Name and define five averages. Discuss their advantages and limitations. 

3. What does a ratio chart show that a chart with a uniform scale does not? If 

you wished to plot data so as to secure the effect of a ratio chart, but had 
no ratio paper available, how would you accomplish the desired result? 

4 . Prove the following: 

(o) The algebraic sum of the deviations of the variates from their mean 
is zero. 

(6) The second moment about an arbitrary point equals the second mo- 
ment about the mean increased by the square of the distance between 
the arbitrary point and the mean. 

1 Explained in Part II. 
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6 . (a) Define and explain how to compute the following: 

Qit Qit Q, MD, 8, <r. # 

( b ) In the case of a normal distribution give the value of each of the first 
four constants in (a) in terms of Z or a. 

6. (a) Give the equation of the normal curve in both arbitrary coordinates and 

standard units. State the relation between abscissas and between ordi- 
nates in the two systems. 

(b) State the properties of the normal curve. 

7. Show how to fit a straight line y — mx + k by the method of moments by 

deriving the expressions for m and k. 

8. Show how to fit an exponential function by the method explained in the 

text. 

9. Show how to fit a parabola by the method of moments. 

10. (a) Give two of the formulas for r. Discuss the use or uses of correlation in 

any problem that occurs to you. 

(b) Show that the slojje of the line in problem 7 may be written r* y /t r«. 

11. Prove that |r| < 1. 

(6) Define the correlation ratio. Discuss its use. 

12. Discuss rank correlation. 

13. Derive the following relations: 

x = cu + Xo 

M2 = — Vi 2 

M2:* = cV 2:u 
Ox = CO*- 

14. The following is a reduced distribution of the breakfast checks at a cafeteria. 

Using the indirect method find 2 and a x . 


X 

/ 

8-12 

4 

13-17 

8 

18-22 

24 

23-27 

21 

28-32 

15 

33-37 

14 

38-42 

7 

43-47 * 

4 

48-52 

2 

53-57 

1 


Ana. 5 ** 27.2«f, a = 9.4ff. 

15. Derive the relations which give the third and fourth moments about the 

mean in terms of moments about an arbitrary origin. Define a$ and 
What information do they give? 

16. Compute the value of a 9 and of a 4 for the distribution in Exercise 14. ' 
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17. The following is a distribution of the heights of students where x denotes 
heights in inches and f is the number of students of the corresponding 
heights. Find 2, a xt a 8 , and a* 


X 

/ 

60.5 

1 

62.0 

3 

63.5 

14 

65.o 

32 

66.5 

61 

68.0 

80 

69.5 

71 

71.0 

35 

72.5 

24 

74.0 

2 

75.5 

1 


18. For N values of a variable v it is known that = 0 and = N: What 

are the origin and unit of i>? 

19. Find in two ways the value of P for which the function 

y = E/(* - 

has the smallest value. 

20. ( Walker ) An algebra test was given to 400 high school children, of whom 

150 were boys and 250 were girls. The results were as follows: 

t*i = 150 n 2 = 250 

Xl = 72.5 x t = 73.6 

<ri = 7.0 <72 — 6.4 

Find the mean and standard deviation of the combined groups. 

21. For a normal distribution of 1500 students' grades, 2 = 75, <r, = 10. What 

values of jr will include the middle 500 grades? How many grades were 
below 60; above 00? 

22. Suppose a distribution of 1000 breakfast, checks from the cafeteria mentioned 

in problem 14 showed the following results: x — 27 ff, c t = 9f*, a 3 = 0, 
a* = 3. O11 the basis of these results what is the expected frequency in 

the 23-27^ class interval? 

23. Given the following data as to the heights ( ij ) and weights (x) of college men: 

’Ey = 6,800, Ei / 2 = 463,025, E*V = 1,022,250 

E* = 15,000, Ex 2 = 2,272,500, N = 100. 

Find f, y, a,, r. 

24 . Derive the expression for the standard error of estimate, 

Sy = <r„( 1 - r‘)«>. 

28 . Discuss the use of S v in predictions. 
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26 . Compute the median, quartiles, and quartile deviation for the following dis- 
tribution where x = bushels per acre and / = corresponding frequency. 


X 

/ 

1 

3 

3 

26 

5 

78 

7 

107 

9 

113 

11 

65 

13 

40 

15 

22 

17 

45 

19 

41 

21 

21 

23 

23 


27 . (a) Find r for the following table using ( u , v) coordinates. 





■ 

23 

/( y) 

18 


3 

i 

1 

6 

15 

2 

■ 

3 

1 

10 

12 

2 

H 

l 


4 

fix) 

4 

D 

6 

2 

20 


(6) For the above data, find 2, y f a X} <r„, and the equations of the regres- 
sion lines. 

28 . For Table 38, (a) find the correlation coefficient, (b) find the equations of the 

lines of regression, (c) locate the coordinate axes through the arithmetic 
mean of the table and plot the lines obtained in ( b ). 

29 . Fit an exponential function of the type y = Ac Bx to the following data: 


X 

0 

“ 1 

4 

y 

2 

10 

100 


First find the equation in the forms 

(a) Y - at + b 

(b) Y = mx -f- k 

and then determine A and B. 
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Table 38 — Correlation Table for Monthly Rainfall at Iowa City and 
Des Moines, 1890-1925 

IOWA CITY 


2/Nv 

•o 

a 

o 

lO 

.t 

c 

»o 

s 

10 

s 

<N 

io 

s 

iO 

<N 

CO 

»o 

s 

CO 

|<0 

a 

§ 

4 

»o 

N 

id 

»o 

10 

1 

<o 

*0 

.t 

to 

•0 

-t 

04 

10 

r- 

j 8.245 1 

iC 

S 

X 

•0 

3 

05 

1 

10 

s 

05 

10.245 j 

10.745 

m 

10.245 









1 

9.745 




• 












i 







1 

9.245 

















1 

1 





n 

8.745 













l 






1 




m 

8.245 













2 


1 

1 







■ 

i 

■ 










1 











1 

2 

i 

1 









2 

1 


1 


2 

1 







B 

6.745 



2 


1 


2 


1 



1 



1 



1 

1 




10 

6.245 












1 











1 

5.745 






4 

1 


1 




1 

2 



1 






10 

5.245 



1 


1 

2 



2 


2 

1 


1 

1 




1 




12 

rai 





2 

1 

2 

2 

1 

1 

1 


1 



2 



1 




14 

EB 



1 


1 

1 

2 


1 

(] 

1 

2 

1 










14 

3.745 




4 

o 

1 

2 

0 

5 

3 

2 

2 

2 


1 








30 

3.245 


2 

1 

4 

6 

6 

3 

7 

2 


1 






1 




1 


34 

2.745 



3 

4 

1 

8 

4 

4 

2 

1 

2 



1 









30 

2.245 



1 

5 

10 

7 

6 

4 

2 

2 













37 

1.745 

1 

4 

7 

12 

13 

8 

5 

1 

1 

1 

2 


1 










56 

1.245 

3 

8 

18 

17 

6 

8 

4 

2 


1 













67 

0.745 

6 

i 

21 

12 

6 

1 

1 

1 



1 



1 









66 

0.245 

13 

12 

3 

4 



















32 

/(*> 

23 

42 

58 < 

82 

49 

47 ; 

32! 

27 : 

L8 

15: 

L4 

7 

10 

5 

6 

5 

3 

2 

5 

0 

1 

1 

432 
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80. How does the scatter diagram assist one in deciding whether the regression is 

linear or non-linear? Give the formulas for the correlation coefficient 
and for the correlation ratio of y on x, explaining the meaningfof the letters 
used. How would you use these indices of correlation to decide whether 
the regression of y on x is linear or non-linear? 

81. (a) In a normal distribution in which 2=0 and <r x = 4, what proportion 

of the data lie where x > 12? 

(6) If 100 of the data lie between x — —6 and x = —8, how many of the data 
are there in the whole distribution? 

82. (a) When the variates are ungrouped what is perhaps the best formula 

for <r x ? Ans . 

[jv2> - (5>) ! ! 1/J 

N 

(b) What does this expression become in terms of N when x refers to the 
integers from 1 to JV? 

33. (a) Expand (a + b + c + d) 2 . 

(b) The expansion of (x t + a? 2 + • • • + x n ) 2 consists of the sum of the 
squares of the x's plus the sum of their products taken two at a time. 
Express this expansion in summation notation. 

34. (a) Show that the formula for MD may be written 

MD=|l*I ft- £ /<*]• 

N Xi<£ x t <x 

Hint. For x* < 2, - 2| = - Yjifa - *) - Hfi& “ Xi) = 

For Xi > 2, J^fi\Xi - 2| = -£/*(2 - x { ). 

Since 2 is the centroid (§14, Chapter III), “2 l/i(2 — x<) for > 2 equals 
£/<(* “ x i) f° r x < < x - 

(b) Using this formula evaluate MD for one of the distributions in the text. 

35. Given N pairs of variates: (xn, x 2 i); ( x 12 , x M ); ( Xu , t 2 s); • • •; (x ln , xtn). 
Show that: 

(а) the mean 2 of all the variates is 

2 = Z) ( Xl > + £«)» 

(б) the variance <r 2 taken about the 2 in (a) is 

a 2 = [21(®i< — $) 2 + £(*« — *)*]• 

2JV i 1 

Note. The quantity 

f -*)(*« -*) 

N<r 2 i 

where 2 and <r 2 are defined as in (a) and (b) is called the intra-class corre- 
lation coefficient. For its use see Statistical Methods for Research Workers, 
Fisher (§38), Oliver and Boyd, London. 
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1 N 

36. Let & = — £ if. Provo that Si = N(N + l)/2, 

Nx-i 

St = N(N + l)(2N + l)/6, Si - S,*. 

87. Sketch the graph of y = Ae fl *, — oo < x < «, when (o) both 4 and fi are 
positive, (6) A is positive and B negative, (c) A is negative and B posi- 
tive, (< d ) both A and B are negative. 

38. A large number of rectangles are drawn all having the same perimeter but 

different bases (x) and altitudes (y). Which of the following is the cor- 
rect answer? The coefficient of correlation between x and y is (a) nega- 
tive and numerically large, (6) positive and numerically small, (c) positive 
and numerically large, (d) approximately zero. 

39. For N correlated values of x and y the regression equation of y on x is found 

to be y = 1 + x. If x = 0, r = 0.5, and <r, = 1, determine y and 8 V . 

40 . Let NS V 2 denote the sum of squares of deviations from the line of least 

squares (Case I). 

(а) Show that NS V 2 = £ y 2 — in£xy — k^y. 

Hint. NS V 2 = 2 (y ”* mx — k) 2 

= Y*y(y - mx - k) - mY,x{y - mx - k) 

- /:£ {y ~ mx - k). 

The last two expressions vanish. Why? 

(б) If m and k are replaced by their determinant values from (5), p. 143, 
show that 

Ey 2 Ey Ew 
Ev N E* » D - 
E*y E x E * 2 

The third order determinant is D bordered by Ev 1 ! Ev> E X V- 

(c) If x and y are replaced by x' and y\ denoting deviations from their 
respective means, find the values of the resulting determinants in (6). 

(d) From the results in (c) show that S v 2 - <v( 1 — r 2 ). 

41 . "Discuss the properties of the normal correlation surface and their use in 

passing judgment on the reliability of predictions based upon the regres- 
sion line of y on x. 

42 . (For calculus students) In fitting points in a plane by a line so that the 

sum of squares of perpendicular deviations shall be a minimum, a second 
line may be found for which the sum of squares of perpendicular devia- 
tions is a maximum. If £d 2 is the sum of squares of deviations from the 
first line and ]£D 2 is the sum °f squares of deviations from the second line, 
show that 5^d 2 /S^ 2 * (1 +r)/( 1 — r). [Reference: Bulletin Ameri- 
can Mathematical Society , vol. 47 (1941), p. 710.] 


N 2> 

H x L * 2 





APPENDIX 

Tables 

I. Ordinates and Areas of the Normal Curve. 

II. Common Logarithms of Numbers to Five Decimal Places. 




Table I. Ordinates and Areas of the Normal Curve, 


V2ir 




t 






t 


So‘*( Ddt 

.00 

.39894 

.00000 

.45 

.36053 

. 17364 

.90 

.26609 

.31594 

.01 

.39892 

.00399 

46 

.35889 

. 17724 

.91 

. 26369 

.31859 

.02 

.39886 

.00798 

.47 

.35723 


.92 

.26129 

.32121 

.03 

.39876 

.01197 

.48 

.35553 

. 18439 

.93 

.25888 

.32381 

.04 

.39862 

.01595 

.49 

.35381 

. 18793 

.94 

.25647 

.32639 

.05 

.39844 

.01994 

.50 

.35207 

. 19146 

.95 

.25106 

.32894 

.06 

.39822 

.02392 

.51 

.35029 

. 19497 

96 

.25164 

.33147 

Hm 7 M 

.39797 

.02790 

.52 

.34849 

.19847 

.97 

.24923 

.33398 

.08 

.39767 

.03188 

.53 

.34667 

.20194 

.98 

.24681 

.33646 


.39733 

.03586 

.54 

.34482 


.99 

. 24439 

.33891 


.39695 

.03983 

.55 

.34294 

■ 

1.00 

.24197 

.34134 

.11 

.39654 

.04380 

.56 

.34105 

.21226 

1.01 

.23955 

.34375 

.12 

.39608 

.04776 

.57 

.33912 

.21566 

1.02 

.23713 

.34614 

.13 

.39559 

^05172 

.58 

.33718 

.21904 

1.03 

.23471 

.34850 

.14 

.39505 

.05.507 

.59 

.33521 

.22240 

1.04 

.23230 

.35083 

.15 

.39448 

.05962 

.60 

.33322 

.22575 

1.05 

.22988 

.35314 

.16 

.39387 

.06350 

.61 

.33121 

22907 

1.06 

.22747 

.35543 

.17 

m&mm 

. 00749 

.62 

.32918 

.23237 

1.07 

.22500 

.35769 

.18 

.39253 

.07142 

63 

.32713 

.23565 

1.08 

.22265 

.35993 

.19 

.39181 

.07535 

64 

.32506 

.23891 

1.09 

.22025 

.36214 

.20 

.39104 

.07926 

.65 

32297 

.24215 

1.10 

.21785 

.36433 

.21 

.39024 

.08317 

.66 

Rtti 

.24537 

1.11 

.21546 

.36650 

.22 

.38940 

.08706 

.67 

.31874 

.24857 

1.12 

.21307 

.36864 

.23 

.38853 

.09095 

.68 

.31659 

.25175 

1 . 13 

.21069 

.37076 

.24 

.38762 

.09483 

.69 

.31443 

.25490 

1.14 

.20831 

.37286 

.25 

.38667 

.09871 

W 8 M 

.31225 

.25804 

1.15 

.20594 

.37493 

.26 

.38568 

. 10257 

.71 

.31006 

.26115 

1.16 

.20357 

.37698 

.27 

.38466 

.10642 

.72 

.30785 

.26424 

1.17 

.20121 

.37900 

.28 

.38361 

.11026 

.73 



1.18 

. 19886 

.38100 

.29 

.38251 

. 11409 

.74 



1.19 

. 19652 

.38298 

.30 

.38139 

. 11791 

■9 

.30114 

.2/337 

1 20 

.19419 

.38493 

.31 

.38023 

. 12172 

mrlm 

.29887 

.27637 

1.21 

. 19186 

.38686 

.32 

.37903 

. 12552 

wan 

.29659 

.27935 

1.22 

. 18954 

.38877 

.33 

.37780 

mvmm 

.78 

.29431 

. 28230 

1.23 

. 18724 

.39065 

.34 

.37654 

. 13307 

.79 

.29200 

.28524 

1.24 

. 18494 

.39251 

.35 

.37524 

. 13683 

.80 

.28969 

.28814 

1.25 

. 18265 

.39435 

.36 

.37391 

. 140.58 

.81 

.28737 

.29103 

1.26 

. 18037 

.39617 

.37 

.37255 

. 14431 

.82 

.28504 

. 29389 

1.27 

. 17810 

.39796 

.38 

.37115 

. 14803 

.83 

.28269 

.29673 

1 1.28 

. 17585 

.39973 

.39 

.36973 

. 15173 

.84 

.28034 

.29955 

1.29 

.17360 

.40147 

.40 

.36827 

. 15542 

.85 

.27798 


1.30 

.17137 

.40320 


.36678 

.45910 

.86 


.30511 

1.31 

. 16915 

.40490 


.36526 

. 16276 

.87 

.27324 


1.32 

.16694 

.40658 


.36371 

.16640 

.88 

mmm \ 

KiuMi 

1.33 

.16474 

.40824 


.36213 

.17003 

.89 

.26848 

.31327 

1.34 

. 16256 

.40988 
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Table I. Ordinates and Areas of the Normal Curve, 4 ( t ) , e-* 1 /* 

V2ir 


t 


f 0 ‘< l >( t)dt 


0(0 

So'Wdt 

t 

*< t ) 


1.35 

. 16038 

.41149 

m 

m 

. 46407 

2.25 

.03174 

.48778 

1.36 

. 15822 

.41309 

1.81 


.46485 

2.26 

.03103 

.48809 

1.37 

. 15608 

.41466 

1.82 

HiMli 

.46562 

2.27 

.03034 

.48840 

1.38 

. 15395 

.41621 

1.83 

.07477 

.46638 

2.28 

.02965 

.48870 

1.39 

. 15183 

.41774 

1.84 

.07341 

.46712 

2.29 

.02898 

.48899 

1.40 

. 14973 

.41924 

1.85 


.46784 

2.30 

.02833 

.48928 

1.41 

. 14764 

.42073 

1.86 

Ririircl 

.46856 

2.31 

.02768 

.48956 

1.42 

. 14556 

Rnl 

1.87 

.06943 

.46926 

2.32 

.02705 

.48983 

1.43 

. 14350 

.42304 

1.88 

. 06814 

.46995 

2.33 

.02643 

.49010 

1.44 

. 14146 

.42507 

1.89 

. 00687 

.47062 

2.34 

02582 

.49036 

1.45 

. 13943 

.42047 

1.90 


.47128 

2.35 

.02522 

.49061 

1.46 

. 13742 

.42786 

1.91 

.06439 

.47193 

.2.36 

.02463 

.49086 

1.47 

. 13542 

.42922 

1.92 

06316 

.47257 

2.37 

.02406 

.49111 

1.48 

. 13344 

.43056 

1.93 

. 00195 

.47320 

2.38 

.02349 

.49134 

1.49 

. 13147 

.43189 

1.94 

. 06077 

.47381 

2.39 

.02294 

.49158 

1.50 

. 12952 

.43319 

1.95 

. 05959 

.47441 

2 40 

.02239 

.49180 

1.51 

. 12758 

.43448 

1.96 

.058-14 

. 17500 

2.41 

.02186 

.49202 

1.52 

. 12566 

.43574 

1.97 

. 05730 

.47558 

2.42 

.02134 

.49224 

1.53 

. 12376 

.43699 

1.98 

.05618 

. 47615 

2.43 

02083 

.49245 

1 54 

. 12188 

.43822 

1.99 

. 05508 

. 47670 

2.44 

.02033 

.49266 

1.55 

.12001 

.43943 

2 00 

. 05399 

. 47725 

2 45 

.01984 

.49286 

1.56 

.11816 

.44062 

2 01 

. 02592 

47778 

2 16 

.01936 

.49305 

1.57 

.11632 

.44179 

2 02 

.05186 

.47831 

2 47 

.01889 

.49324 

1.58 

.11450 

.44295 

2.03 

. 050 H 2 

. 47882 

2.48 

.01842 

.49343 

1.59 

.11270 

.44408 

2.04 

. 04980 

. 47932 

2.49 

01797 

.49361 

1.60 

.11092 

.44520 

2.05 

. 04879 

.47982 

2.50 

,01753 

.49379 

1.61 

. 10915 

.44630 

2.06 

.04780 

.48030 

2.51 

.01709 

.49396 

1.02 

. 10741 

.44738 

2.07 

. 046)82 

. 48077 

2.52 

.01667 

.49413 

1.63 

. 10567 

.44845 

2.08 

. 04580 

. 48124 

2.53 

.01625 

.49430 

1.64 

.10396 

.44950 

2.09 

.01491 

. 48169 

2.54 

.01585 

.49446 

1.65 

. 10226 

.45053 

2.10 

.04398 

.48214 

2.55 

.01545 

.49461 

1.66 

. 10059 

.45154 

2.11 

.04307 

.48257 

2.56 

.01506 

.49477 

1.67 

.09893 

.45254 

2.12 

.01217 

.48300 

2.57 

.014 G 8 

.49492 

1.68 

.09728 

.45352 

2.13 

.04128 

. 48341 

2.58 

.01431 

.49506 

1.69 

.09566 

.45449 

2. 14 

.04041 

.48382 

2.59 

.01394 

.49520 

1.70 

.09405 

.45543 

2.15 

.03955 

. 48422 

2.60 

.01358 

.49534 

1.71 

.09246 

.45637 

2.16 

.03871 

.48461 

2.61 

.01323 

.49547 

1.72 

.09089 

.45728 

2.17 

.03788 

. 48500 

2.62 

.01289 

.49560 

1.73 

.08933 

.45818 

2.18 

.03706 

. 48537 

2.63 

.01256 

.49573 

1.74 

.08780 

.45907 

2.19 

.03626 

.48574 

2.64 

.01223 

.49585 

1.75 

.08628 

.45994 

2.20 

. 03547 

.48610 

2.65 

.01191 

.49598 

1.76 

.08478 

.46080 

2.21 

.03470 

.48645 

2.66 

.01160 

.49609 

1.77 

.08329 

.46164 

2.22 

. 03394 

.48679 

2.67 

.01130 

.49621 

1.78 

.08183 

.46246 

2.23 

.03319 

.48713 

2.68 

.01100 

.49632 

1.79 

.08038 

.46327 

2.24 

.03246 

.48745 

2.69 

.01071 

.49643 
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Table I. Ordinates and Areas of the Normal Curve, £(0 = — 

v2ir 


t 

^(0 

So'mdt 

t 

<t>(D 

So *(t)dl 

t 



2.70 

.01042 

.49653 

3.15 

.00279 

.49918 

3.60 

.00061 

.49984 

2.71 

.01014 

.49664 

3.16 

.00271 

.49921 

3.61 

.00059 

.49985 

2.72 

.00987 

.49674 

3.17 

.00262 

.49924 

3.62 

.00057 

.49985 

2.73 

.00961 

.49683 

3.18 

.00254 

.49926 

3.63 

.00055 

.49986 

2.74 

.00935 

.49693 

3.19 

.00246 

.49929 

3.64 

.00053 

.49986 

2.75 


.49702 

3.20 

.00238 


3.65 

.00051 

.49987 

2.76 

RBIil 

.49711 

3.21 

.00231 

.49934 

3 66 

.00049 

.49987 

2.77 


.49720 

3.22 

. 00224 

.49936 

3.67 

.00047 

.49988 

2.78 

mmvi 

.49728 

3.23 

.00216 

.49938 

3.68 

.00046 

.49988 

2.79 

1 

.49736 

3.24 

.00210 

.49940 

3.69 

.00044 

.49989 

2.80 


.49744 

3.25 

.00203 

.49942 

3.70 

.00042 

.49989 

2.81 

.00770 

.49752 

3.20 

.00196 

.49944 

3 71 

.00041 

.49990 

2.82 

■iTr 'c 31 

.49760 

3.27 

.00190 

.49946 

3.72 

.00039 

.49990 

2.83 

Hill r t 

.49767 

3.28 

.00184 

.49948 

3.73 

.00038 

.49990 

2.84 


.49774 

3.29 

.00178 


3.74 

.00037 

.49991 

2.85 

.00687 

.49781 

3.30 

. 00172 

49952 

3.75 

.00035 

.49991 

2.86 

■iiliMt :fl 

. 49788 . 

3 31 

. 00167 

.49953 

3.76 

.00034 

.49992 

2.87 

mm j 

. 49795 ' 

3.32 

. 00161 

.49955 

3.77 

00033 

.49992 

2.88 

whim 1 

KM 01 

3.33 

. 00156 

.49957 

3 78 

00031 

.49992 

2.89 

.00613 

.49807 

3.34 

.00151 

.49958 

3.79 

.00030 

.49992 

2.90 

.00595 

49813 

3.35 

00146 

49960 

3 80 

.00029 

.49993 

2.91 

00578 

49819 

3.36 

00141 

.49961 

3.81 

.00028 

.49993 

2.92 

.00562 

.19825 

3.37 

00136 

.49962 

3 82 

.00027 

.49993 

2.93 

.00545 

.49831 

3.38 

.00132 

.49964 

3.83 

.00026 

49994 

2.94 

.00530 

. 49836 

3.39 

.00127 

49965 

3.84 

00025 

.49994 

2.95 

.00514 

.49841 

3.40 

.00123 

.49966 

3.85 

.00024 

.49994 

2.96 

.00499 

.19846 

3 41 

.00119 

49968 

3.86 

.00023 

.49994 

2.97 

.00485 

.49851 

3 42 

.00115 

.49969 

3.87 

.00022 

.49995 

2.98 

.00471 

.49856 

3.43 

.00111 

.49970 

3.88 

.00021 

.49995 

2.99 

.00457 

.49861 

3.44 

.00107 

. 49971 

3.89 

.00021 

.49995 

3.00 

KIRI 

.49865 

3.45 

.00104 

.49972 

3.90 

.00020 

.49995 

3.01 

Hiiif&fll 

.49869 

3.46 

.00100 

.49973 

3.91 

1 .00019 

.49995 

3.02 

.00417 

.49874 

3.47 

. 00097 

.49974 

3.92 

.00018 

.49996 

3.03 

.00405 

.49878 

3.48 

.00094 

.49975 

3.93 

.00018 

.49996 

3.04 

.00393 

.49882 

3.49 

.00090 

49976 

3.94 

.00017 

.49996 

3.05 

.00381 

.49886 

3.50 

.00087 

.49977 

3.95 

.00016 

.49996 

3.06 

.00370 

.49889 

3.51 

.00084 

.49978 

3.96 

.00016 

.49996 

3.07 

.00358 

.49893 

3.52 

.00081 

.49978 

3.97 

.00015 

.49996 

3.08 

.00348 

.49897 

3.53 

.00079 

.49979 

3.98 

.00014 

.49997 

3.09 

.00337 

.49900 

3.54 

.00076 

.49980 

3.99 

.00014 

.49997 

3.10 

.00327 

.49903 

3.55 

.00073 

.49981 




3.11 

.00317 

.49906 

3.56 

.00071 

.49981 




3.12 

.00307 

.49910 

3 57 

.00068 

.49982 




3.13 

.00298 

.49913 

3.58 

.00066 

. 49983 




3.14 

.00288 

.49916 

3.59 

.00063 

.49983 
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Table II. Common Logarithms of Numbers to Five Decimal Places 



Reprinted by permission from " Plane Trigonometry " by Simmons and Gore, John Wiley 
A Sons, Ine. 


























Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 


29 28 

2.9 2.8 

5.8 5.6 
8.7 8.4 

11.6 11.2 

14.5 14.0 

17.4 16.8 

20.3 19.6 

23.2 22.4 
26.1 25.2 


27 29 

2.7 2.6 

5.4 6.2 
8.1 7.8 

10.8 10.4 

13.6 13.0 

16.2 16.6 

18.9 18.2 

21.6 20.8 

24.3 23.4 


2.5 

6.0 

7.6 

10.0 

12.6 

16.0 

17.6 

20.0 

22.6 


2ft 28 

2.4 2.3 

4.8 4.6 

7.2 6.9 

9.6 9.2 

12.0 11.6 

14.4 13.8 

16.8 16.1 

19.2 18.4 

21.6 20.7 


22 21 

2.2 2.1 

4.4 4.2 

6.6 6.3 

8.8 8.4 

11.0 10.5 

13.2 12.6 

15.4 14.7 

17.6 16.8 

19.8 18.9 


Prop. Parti 


_N 






JS 





150 


638 

667 

696 

725 

754 

782 

811 

840 

869 


17 898 

926 

955 

98 

*01 

"041 

*070 

*099 

*127 

*156 


18 184 

2L 

241 

270 

298 

327 

35 

384 

412 

441 


469 

498 

526 

554 

583 

611 

639 

66; 

696 

724 


18 752 

780 

808 

83: 

865 

893 

921 

949 

971 

*005 


19 033 

061 

089 

117 

14, 

173 

201 

229 

251 

285 


312 

340 

368 

396 

424 

451 

479 

507 

535 

562 


590 

618 

645 

673 

700 

728 

, 756 

783 

811 

838 


19 866 

893 

921 

948 

976 *003 *030 

‘058 


112 


20 140 

167 

194 

222 

249 

276 

303 

330 


385 

160 

412 

439 

466 

49; 

520 

548 

575 

602 

629 

656 


683 

710 

737 

763 

790 

817 

844 

871 

898 

925 


20 952 

978 

*005 *032 

*059 

*085 

"112 

"139 

"165 

*192 


21 219 

245 

272 

299 

325 

352 

378 

405 

431 

458 


484 

511 

537 

564 

590 

617 

643 

669 

696 

722 


21 748 

775 

801 

827 

854 

880 

906 

932 

958 

985 


22 011 

037 

063 

089 

115 

141 

167 

194 

220 

246 

■I 


272 

298 

324 

350 

376 

401 

427 

453 

479 

505 H 


531 

55 7 

583 

608 

634 

660 

686 

712 

737 

763 I 


22 789 

814 

840 

866 

891 

917 

943 

968 

994 :*019 1| 

170 

23045 

070 

096 

121 

147 

172 

198 

223 

249 

274|| 


300 

325 

350 

376 

401 

426 

452 

477 

502 

528 


553 

578 

603 

629 

654 

679 

704 

729 

754 

779 


23 805 

830 

855 

880 

905 

930 

955 

980 

'005 

"030 


24 055 

080 

105 

130 

155 

180 

204 

229 

254 

279 


304 

329 

353 

378 

403 

428 

452 

477 

502 

527 


551 

576 

601 

625 

650 

674 

699 

724 

748 

773 


24 797 

822 

846 

871 

895 

920 

944 

969 

993 

‘018 


25 042 

066 

091 

115 

139 

164 

188 

212 

237 



28 5 

310 

334 

358 

382 

406 

431 

455 

479 


180 

527 

551 

575 

600 

624 

648 | 672 

696 

720 



25 768 

792 

816 

840 

864 

888 | 

912 

935 

959 

983 


26 007 

031 

055 

079 

102 

126 

150 

174 

198 

221 


245 

269 

293 

316 

340 

364 

387 

411 

435 

458 


482 

505 

529 

553 

576 

600 

623 

647 

670 

694 


717 

741 

764 

788 

811 

834 

858 

881 

905 

928 


26 951 

975 

998 1*021 

045 

068 *091 

114 

138 

161 


27 184 

207 

231 

254 

277 

300 

323 

346 

370 

393 


416 

439 

462 

485 

508 

531 

554 

577 

600 

623 


646 

669 

692 

715 

738 

761 

784 

807 

830 

852 

190 

27 875 

898 

921 

944 

967 

989 *012 

'035 

058 

081 

91 

28 103 

126 

149 

171 

194 

217 

240 

262 

285 

307 

92 

330 

353 

375 

398 

421 

443 

466 

488 

511 

533 

93 

556 

578 

601 

623 

646 

668 

691 

713 

735 

758 

94 

28 780 

803 

825 

847 

870 

892 

914 

937 

959 

981 

95 

29 003 

026 

048 

070 

092 

115 

137 

159 

181 

203 

96 

226 

248 

270 

292 

314 

336 

358 

380 

403 

425 

97 

447 

469 

491 

513 

535 

557 

579 

601 

623 

645 

98 

667 

688 

710 

732 

754 

776 

798 

820 

842 

863 

99 

29 885 

907 

929 

951 

973 

994 *016 

038 

060 

081 

200 

30103 

125 

146 

168 

190 

211 

233 

255 

276 

298 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


Prop. Parts 




51 

531 

543 

555 

568 

580 

593 

52 

654 

667 

679 

691 

704 

716 

53 

777 

790 

m ig 

814 

827 

839 

54 

54 900 

913 

937 

949 

962 

55 

55 023 

035 

047 

060 

072 

084 

56 

145 

157 

169 

182 

194 

206 

67 

267 

279 

291 

303 

315 

328 

58 

388 

400 

413 

425 

437 

449 

59 

509 

522 

534 

546 

558 

570 


582 594 606 618 



61 751 

62 871 

63 55 991 


64 56 110 122 

65 229 241 

66 348 360 

67 467 478 

68 585 597 

69 703 714 


71 156 937 

72 157 054 

73 


787 799 
907 | 919 
*038 



549 56 1 
667 679 
785 797 


91 218 

92 329 

93 439 


984 

996 

101 

113 

217 

229 

334 

345 

449 

461 

565 

576 

680 

692 

795 

807 

910 

921 

*024 

*035 

138 

149 

252 

1*63 

365 

377 t 

478 

490 

591 

602 

704 

715 

816 

827 

928 

939 

*040 

*051 

151 

162 


380 392 
496 507 
611 623 

726 738 
841' 852 
967 


561 

572 

583 

594 

605 

671 

682 

693 

704 

715 

780 

791 

802 

813 

824 

890 

999 

108 

901 

*010 

119 

iH 

923 

*032 

141 

934 

*043 

152 


161 

172 

274 

286 

388 

399 

501 

512 

614 

625 

726 

737 

838 

850 

950 

961 

*062 

*073 

173 

184 

284 

295 

395 

406 

506 

517 

616 

627 

726 

737 

835 

846 

945 

956 

*054 

*065 

163 

173 


Prop. Parts 
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Table II. Common Logarithms of Numbers to Five Decimal Places 


818 823 827 832 
864 868 873 877 

909 914 918 923 


64 97 955 959 964 968 973 

66 98 000 005 009 014 019 

66 046 050 055 059 064 



855 869 
900 905 
946 950 

991 996 
037 041 
082 087 


109 114 118 123 127 132 

155 159 164 168 173 177 

200 204 209 214 218 223 


290 295 
336 340 
381 385 

426 430 
471 
516 520 

561 565 
605 610 
650 655 


193 

198 

202 

238 

242 

247 

282 

286 

291 

326 

330 

335 

370 

374 

379 

414 

419 

423 

458 

463 

467 

502 

506 

511 

546 

550 

| 555 


607 

612 

616 

621 

1 651 

656 

660 

664 

695 

699 

704 

708 

739 

743 

747 

752 

782 

787 

791 

795 

826 

830 

835 

839 

870 

874 

878 

883 

913 

917 

922 

926 

99 957 

961 

965 

970 


629 634 
673 677 
717 721 


642 647 
686 691 
730 734 

774 778 
817 822 
861 865 

904 909 
948 952 
991 996 












































INDEX 


Arithmetic mean, 33 
short methods of computing, 39 
of sub-sets, 44, 193 
Array, 193 

Asymmetry, see skewness 
Averages, Chapter III 
discussion of different, 51, 52- 
58 

Average deviation, see mean devi- 
ation 

Burr, I. W., Ill ft. nt. 

Charlier check, 66, 87 
Charts, 24 
ratio, 157 

Classification of data, 9-15 
Class 

boundary, 15 
interval, 11 
limits, 15 
marks, 11 
mid-value of, 1 1 
Coefficient 

of alienation, 185 
of correlation, Chapter VII 
of variation, 90 
Collateral reading, 5 
Combination of sets, 99 
Compound interest law, 156 
Computing machines, 4, 71 
Constant, 7 
Correlation 
and regression, 178 
coefficient, Chapter VIII 
rank, 222 
ratio, 212 

relation to common causes, 
225 

interpretation of, 225 


intraclass, 232 
surface, 208 
table, 189 

Cumulative frequencies, 16, 27, 
132 

Curve of error, see normal curve 
Curve fitting, Chapter VII, 124 
Curves of growth, 53, 152, 164, 
106 

Deviation, 36 
mean or average, 84 
root-mcan-square, 87 ft. nt., 
99 

Dispersion, see measures of, 
relative 90 
Dwyer, P. S., 176 
Estimate, standard error of, 179 
Frequency 
curves, 25, 112 
distributions, Chapter I 
graphical representation of, 
Chapter II 
polygon, 24 

Function, definition, 22 
exponential, 152 
frequency, 112 
linear , 137 
parabolic, 162 
quadratic, 138 
Geometric mean, 52 
Compertz curve, 164 
Graduation by means of normal 
curve, 128 

Graphical representation, Chap- 
ters II, VII 
Harmonic mean, 55 
Histogram, 25 
Hotelling, H., 167 ft. nt. 
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Index 


Huntington^ E. V., 167 
Kendall, M. G., 51 ft. nt. 
Kurtosis, 73, 109 
Least-squares method, 144 
Logarithmic paper, 161 
Logistic curve, 166 
Makeham’s law, 167 
Mean 

arithmetic, 33 
geometric, 52 
harmonic, 55 
of means, 43 
Mean deviation, 84 
Measures of dispersion, Chap- 
ter V 

mean deviation, 84 
quartiles, 82 

semi-interquartile range, 82 
standard deviation, 86 
Median, 47 
Mode, 47 

Moment of a distribution, Chap- 
ter IV 

method of, 141 
Normal curve, Chapter VI 
explanation of tables of, 1 16 
fitted to observed data, 124 
properties of, 118- 
standard form of, 115 
Normal equations, 145 
Ogive, 27 

Parabola, fitting a, 162 
Parameter, 115, 124, 141, 153 
Percentiles, 84 
Probability, 131 
Probability paper, 132 
Quartiles, 82 
of normal curve, 119 
Range, 16 
Ratio charts, 157 
Reed-Pearl curve, 166 


Regression 
coefficients, 178 
linear, 177 , 

non-linear, 212 
testing linearity of, 217 
Residuals, 144 
Rietz, H. L., 1 14 ft. nt. 

Scatter diagram, 171 
Semi-logarithmic paper, 157 
Sheppard’s corrections, 78, 88 
Shewhart, W. A., 75 
Skewness, 73, 109 
Snedecor, G. W., 178 
Standard units, 69 
Statistic, 124' 

Standard deviation, 68 
of combination of sets, 99 
of grouped data, 86 
of ungrouped data, 93 
Straight line, 137 
fitting to data, 140 
Symmetry, 73, 109 
Tables 

areas under normal curve, 
Appendix 

logarithms of numbers, Ap- 
pendix 

ordinates of normal curve 
Appendix 
Tabulation, 9 
Time series, 150 
Translation of axes, 36 
Trend, 150, 162 
Variability, see dispersion 
Variable, 7 
Variance, 87 
Variates, 7 

Walker, Helen M., 199 
Weighted mean, 33 
Wilkens, J. E., 74 ft. nt. 

Wilson, E. B., 228 










